Bioinformatics
Because of advances in genomic data acquisition technologies, we have seen an exponential growth in data volume, concurrent with the biomedical Big Data era. In particular, massively parallel sequencing technologies are now capable of producing terabytes of sequence data. The complexity and volume of this data has exceeded the capabilities of traditional analytical methods. Thus, bioinformaticians have focused on developing efficient algorithms and systems, including tools that align DNA/RNA sequences, quantify gene/transcript/protein/metabolite abundance, detect gene polymorphisms, and identify differentially expressed genes/transcripts/proteins/metabolites. Although scientists now have numerous options when choosing such algorithms, there are no clear guidelines for selecting appropriate tools or algorithms for a given analytical situation. In addition, researchers have shown that integration of existing knowledge-bases can improve analytical results by guiding the selection of appropriate algorithms. Unfortunately, knowledge-bases are often heterogeneous and are lacking in standardized data formats, hindering computational analysis. Bioinformatics research in the Bio-MIBLab has attempted to address these Big Data challenges in bioinformatics by:
- Developing metrics as guidelines for choosing bioinformatics tools
- Developing algorithms for integrated analysis of bioinformatics data
- Developing software tools and infrastructure that leverage distributed computing resources for Big Data analysis
Bioinformatics research in the Bio-MIBLab is divided into the following categories:
- Algorithms, tools, and infrastructure for gene expression microarrays
- Gene expression-based prediction modeling
- Algorithms and analysis of massively parallel genomic sequencing