Accession Number : ADA526660


Title :   PIPA: A High-Throughput Pipeline for Protein Function Annotation


Descriptive Note : Conference paper


Corporate Author : BIOTECHNOLOGY HPC SOFTWARE APPLICATIONS INST FORT DETRICK MD


Personal Author(s) : Yu, Chenggang ; Desai, Valmik ; Zavaljevski, Nela ; Reifman, Jaques


Full Text : http://www.dtic.mil/dtic/tr/fulltext/u2/a526660.pdf


Report Date : Jul 2008


Pagination or Media Count : 12


Abstract : Traditional experimental methods to determine the functions of proteins encoded in genomic sequences cannot keep pace with the avalanche of sequence data produced by new high-throughput sequencing technologies. This prompted the development of numerous bioinformatics approaches for automated protein function annotation. However, different function classification terminologies are frequently used by these different approaches, precluding the integration of multisource predictions. We developed Pipeline for Protein Annotation (PIPA), a genome-wide protein function annotation pipeline that runs in a high-performance computing environment. PIPA integrates different tools and employs the Gene Ontology (GO) to provide consistent annotation and resolve prediction conflicts. PIPA has three modules that allow for easy development of specialized databases and integration of various bioinformatics tools. The first module, the pipeline execution module, consists of programs that enable the user access to and control of the pipeline's parallel execution of multiple jobs, each searching a particular database for a chunk of the input data. The execution module wraps the second module, the core pipeline module. The integrated resources, the program for terminology conversion to GO, and the consensus annotation program constitute the main components of the core module. The third module is the preprocessing module. This last module contains the program for customized generation of protein function databases and the GO-mapping generation program, which creates GO mappings for the terminology conversion program. The current implementation of PIPA annotates protein functions by combining the results of an in-house-developed database for enzyme catalytic function prediction (CatFam) and the results of multiple integrated resources.


Descriptors :   *CLASSIFICATION , *BIOMEDICAL INFORMATION SYSTEMS , *GENOME , SEQUENCES , RELATIONAL DATA BASES , THROUGHPUT , PROTEINS , SYMPOSIA , SOFTWARE ENGINEERING


Subject Categories : Information Science
      Genetic Engineering and Molecular Biology
      Computer Programming and Software


Distribution Statement : APPROVED FOR PUBLIC RELEASE