
【Key Words】Database integration, Semantic web, Genome analysis, Functional analysis, Text mining
In the life science research field, a huge amount of data is produced by omics projects for various species including human and it is becoming more and more important to build a database for dealing with such data. Our laboratory, Database Center for Life Science in Research Organization of Information and Systems, focuses on the database integration and its utilization for life science applications. Research topics include database development for genomes and their functions, prediction for gene functions from genomes and other omics data, and knowledge extraction methods from literature that can augment the current database contents.
For the integration of databases developed all over the world, it is inevitable to define meaning of data and their relationships in a common framework. We have modeled the data using the Resource Description Framework (RDF) and have been developing databases with common IDs and ontologies. The resulting data is represented as a huge graph, a knowledge graph, that connects data from various databases. We have been developing methodologies for constructing knowledge graphs and for efficiently extracting data from them. Recently, we have developed TogoID (https://togoid.dbcls.jp/) for connecting IDs between data, and TogoDX (https://togodx.dbcls.jp/human/) to obtain the necessary data from a bird's eye view of multiple data sets. In addition, we are collaborating with bioinformatics and biology database researchers in Japan to develop databases such as jPOST for proteome data and GlyCosmos for glycome data.
The other possible research topics include:
(1) Moriya, Y., Kawano, S., Okuda, S., Watanabe, Y., Matsumoto, M., Takami, T., Kobayashi, D., Yamanouchi, Y., Araki, N., Yoshizawa, A. C., Tabata, T., Iwasaki, M., Sugiyama, N., Tanaka, S., Goto, S. and Ishihama, Y.; The jPOST environment: an integrated proteomics data repository and database. Nucleic Acids Res. in press (2017).
(2) Moriya, Y., Yamada, T., Okuda, S., Nakagawa, Z., Kotera, M., Tokimatsu, T., Kanehisa, M. and Goto, S.; Identification of enzyme genes using chemical structure alignments of substrate-product pairs. J Chem. Inf. Model. 56:510-516 (2016).