Accession Number : ADA572794


Title :   A Web-based High-Throughput Tool for Next-Generation Sequence Annotation


Descriptive Note : Conference paper


Corporate Author : ARMY MEDICAL RESEARCH AND MATERIEL COMMAND FORT DETRICK MD TELEMEDICINE AND ADVANCED TECH RESEARCH CENTER


Personal Author(s) : Kumar, Kamal ; Desai, Valmik ; Cheng, Li ; Khitrov, Maxim ; Grover, Deepak ; Satya, Ravi V ; Yu, Chenggang ; Zavaljevski, Nela ; Reifman, Jaques


Full Text : http://www.dtic.mil/dtic/tr/fulltext/u2/a572794.pdf


Report Date : Jun 2011


Pagination or Media Count : 9


Abstract : The availability of a large number of genome sequences, resulting from inexpensive, high-throughput next-generation sequencing platforms, has created the need for an integrated, fully-automated, rapid, and high-throughput annotation capability that is also easy-to-use. Here, we present a web-based software application, Annotation of Genome Sequences (AGeS), which incorporates publicly-available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. The current version of AGeS provides annotations for bacterial genome sequences, and serves as a readily-accessible resource to Department of Defense (DoD) scientists for storing, annotating and visualizing genomes of newly-sequenced pathogens of interest. The AGeS system is composed of two major components. The first component is a web-based application that provides a graphical user interface for managing users' input genomes, submitting annotation jobs, and visualizing results. Sequence contigs are uploaded as a multi-FASTA input file and submitted for annotation, and the resulting annotations are visualized through GBrowse. The input genome sequences and the annotation results are stored in a secure, customized database. The second component is a high-throughput annotation pipeline for finding the genomic regions that code for proteins, RNAs and other genomic elements through a Do-It-Yourself Annotation framework. The pipeline also functionally annotates the protein-coding regions using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation. The annotation pipeline has been deployed on the Mana Linux cluster at the Maui High Performance Computing Center. The two components are connected together using the DoD user interface toolkit application programming interface. The AGeS system was evaluated for scaling of its parallel execution and annotation performance. AGeS scaled with super-linear speedup for up to 128 processors.


Descriptors :   *GENOME , *HIGH PERFORMANCE COMPUTING , BACTERIA , COMPUTER PROGRAMS , DATA BASES , GRAPHICAL USER INTERFACE , INTERNET , SEQUENCES , THROUGHPUT


Subject Categories : Genetic Engineering and Molecular Biology
      Computer Programming and Software
      Computer Systems


Distribution Statement : APPROVED FOR PUBLIC RELEASE