|
Research
|
Overview
|
My research areas are Data Science and Computational Life Science in general, with a specific focus on the semantics and analysis of big, heterogeneous data as well as computational genomics and transcriptomics, in particular, with regard to the exploration of next-generation sequencing data. My up-to-date CV can be found here. During my prior multidisciplinary research activities, I have organized and led close collaboration within a group of scientists from a wide range of areas: computer science, genome and transcriptome biology, bioinformatics/biomedical informatics, and clinical science. Collaborators in my team are from various institutions including University of Utah School of Medicine, University of Oregon, The Jackson Laboratory, Georgetown University Medical Center, and University at Buffalo, etc. In addition, our research team also has both an in-house Ion Torrent Personal Genome Machine and active accounts on the SGI Ultraviolet and DMC clusters at the Alabama Supercomputer Center, thus significantly facilitating our recent research focus on one challenging problem in human genomics: How to effectively characterize microRNA::mRNA regulatory interactions? |
|
|
Prior biological and biomedical research has indicated that microRNAs
(a.k.a. miRNAs or miRs) perform critical roles in biological processes by regulating
their respective target genes. Thus, miRs are closely associated with various human
cancers and have shown great potential in many aspects: from early diagnosis to
personalized treatment and prognosis prediction, as well as disease prevention.
Unfortunately, cancer biologists are facing significant challenges and critical
barriers in knowledge discovery, unification, and dissemination in miR research
(microgenomics in general). The manual integration of information on miRs and their
target genes is challenging: labor-intensive, error-prone, and subject to biologists’
prior knowledge – because it involves an extremely large amount of heterogeneous
data sources to be explored.
As the very first semantic computational tool specifically designed
for the miR field, OmniSearch will provide the community with robust and efficient
access to knowledge about the role of miRs, especially as they contribute to biological
processes that are critical to cancer disease processes and responses. We aim to
address challenges of miR data sharing, efficient knowledge unification, and effective
searches. OmniSearch will realize semantic data integration and semantic search
in an automated and highly efficient manner; it will also cross-reference, via a
shared, controlled vocabulary, a wide range of biological and biomedical ontologies
and knowledge bases: Gene Ontology (GO), Sequence Ontology (SO), PRotein Ontology
(PRO), Universal Protein Resource (UniProt), and Neural ElectroMagnetic Ontologies
(NEMO). As a result, OmniSearch will significantly facilitate miR knowledge acquisition
in oncology within a wide scope. OmniSearch can be used by biologists and bioinformaticians
to obtain unified miR knowledge and derive insights for the regulation and control
of cancer disease processes. By providing the community with a deeper understanding
of miRs’ important biological functions, OmniSearch will also assist miR bio-curation
and new biological experiment design. Thus, the project is expected to significantly
accelerate cancer biology research. In addition, OmniSearch is by its nature extensible
and can be readily generalized to other biomedical areas beyond the miR and microgenomics
field. |
|
Future Research Plan
|
I will continue my current research focused on how to extract meaningful knowledge from large amounts of heterogeneous data, in particular, with regard to the exploration of next-generation sequencing (NGS) data. My two ultimate research goals are (1) to make fundamental contributions to the semantics-related aspect of data science (including automated development of large-scale bio-ontologies and more effective semantic data mining) and (2) to investigate effective computational methods to help fully integrate next-generation genomics and transcriptomics into the clinic, thereby to better understand the genetic basis of disease and discover relevant genomic hallmarks, and eventually, to help people live longer, healthier lives.
- Short-term research plan:
- Comprehensive exploration of miR::mRNA regulatory interactions
- Long-term research plan
- Integrative study of next-generation cancer genome sequencing
- Systematical characterization of human genomic and transcriptomic sequences
|
|
|
|