
【Key Words】Bioinformatics, Computational Biology, Artificial Intelligence, Machine Learning, Biological Science
Our laboratory is focusing on making biological discoveries through the application of statistical methods to genome-scale data such as genome sequences, microarray data, and next-generation sequencer data. We are also working on developing new probabilistic and mathematical tools that are necessary for such analysis.
Since the first successes in the 1990s, researchers have succeeded in decoding the full genome of thousands of species. The information generated from those efforts is not limited to genome sequences, but also includes other building blocks of life such as RNA, proteins, metabolites, and DNA modifications. However, integrated analysis of such extremely heterogeneous data has only just begun,and many problems await solutions. We are applying statistical techniques to detect faint signals in the noise that will lead to a deeper understanding of life.
Various RNA molecules such as messenger RNA,transfer RNA, and micro RNA are involved in the expression of proteins. Most RNA molecules form secondary structures through base pairs such as A-U and C-G. The stabilizing energies of secondary structures are relatively large, and have a significant impact on the regulation and efficiency of gene expression. There exist very accurate models of RNA secondary structures that use a concept from the information sciences called stochastic context-free grammar, which allow for computer-based investigations of RNA structures. By intensively using such models, we are studying various biological processes involving RNA,such as molecular interactions of micro RNA and RNA-binding proteins, alternative splicing, and messenger RNA translation (Fig. 1). We are also investigating RNA structural evolution using genome sequences of human and vertebrate populations (Fig. 2).
Fig . 1: Genomic-scale sequence analysis using the software tool Raccess to calculate RNA accessibility. Raccess is useful for determining which region forms exposed secondary
structures in all regions of RNA transcription.
Fig . 2: (Left) Phylogenetic tree analysis of local evolution in 28 vertebrates pecies , including human, calculated using Fdur, aprogram that we developed. (Right) Base substitution patterns in micro-RNA regions calculated by Fdur.
Cancers are diseases in which cells multiply uncontrollably, and are often caused by accumulation of DNA mutations. In many types of cancer, each cell division causes various types of mutations to genome sequences. Since such changes in cancer genomes are similar to the genome evolution during speciation,we can use various evolutionary and genetic tools to study cancer progression. We are using tools from population genetics such as Markov processes and the coalescent theory to estimate growth of cancer tissues. We are seeking for methods that allow for computing the probabilities of cancer metastasis or recurrence from the estimated quantitative data.
Embryonic development in animals begins with the cleavage of fertilized eggs,followed by gastrulation and mesoderm differentiation, which results in the formation of organs, bones, and muscles. Such macroscopic changes of animal morphology are precisely controlled through complex interactions between transcription networks and signaling molecules. However, the technologies for making predictions about those mechanisms from cell-level data such as transcription factor bindings and histone modifications are still in its infancy. We are developing methods that combine differential equations for embryonic development from mathematical biology with Bayesian analysis of gene regulatory networks from bioinformatics,in order to associate macroscopic stages of embryonic development with microscopic sequencing data. We are aiming to simulate animal developmental processes by using sequencing data.
We perform our research in close association with the Computational Biology Research Center at the National Institute of Advanced Industrial Science and Technology.
As we are a “dry”laboratory with no experimental facilities, we analyze public data and the experimental results of collaborating labs using computers,rather than generate our own experimental results.