Laboratory of Informatics of Molecular Functions(AIST) does not recruit students for the academic year 2018

Associate Professor Paul HORTON
E-mail: horton-p{at}
Lab HP


Team members

 The CBRC sequence analysis team is made up of six talented researchers, including Martin Frith, and Kentaro Tomii, another visiting professor like me and the developer of software such as the protein structural prediction tool FORTE. Our researchers specialize in various fields, including mathematics, algorithms, software development, webserver construction and operation, and microbiology. Our members are from five countries and speak three languages (Japanese, English, and Chinese), and we have a wonderful time working together.

 Our lab performs genome analysis alongside the CBRC sequence analysis team.

Why genome analysis?

 Our genome is a condensed record of 3.8 billion years of evolutionary experience that determines the fate of every cell and every individual. Genome sequences are compact, allowing for digital representations easily manipulated by computers; precise, resulting in little error as compared to other observational data from experiments; and universal, in that unlike with gene expression,there is little difference between cells from a given individual.

What kind of analysis?

 We apply various techniques to genome analysis,including string data structures like suffix arrays and suffix trees; optimization algorithms such as dynamic programming, expectation maximization, and branch and bound; and machine learning and statistical processing methods such as mixture models, weighted k-nearest neighbors, minimum spanning trees, and self-organizing maps. Even the best of algorithms are worthless without equally good implementations, so we also put a high emphasis on software development. Most of our work is done in Linux, and the new algorithms we develop are freely distributed

Recent results

Mitochondrial β-barrel proteins
 As in bacteria,mitochondria have in their outer membrane proteins with structures called β-barrels. Our recent research has indicated a high probability that the number of types of such structure is less than 10% that found in bacteria (Imai et al., 2008)

String algorithms
 We investigate topics such as upper limits on motif extraction computation and improved algorithms for suffix array construction.

Genome sequence analysis software
 We have created a sequence similarity search program called LAST that copes well with repeated sequences and uses our improved method of suffix array data construction.

Student environment and activities

 Established in 2007, our lab has only just begun our research. In 2008 we accepted one student each from the Master’s and Doctoral programs,and we plan to accept one more Master’s student in 2009. We hold relaxed weekly group discussions at which students can freely ask CBRC sequence analysis team researchers for advice.
 Our students are currently studying topics such as protein nuclear localization signals and mitochondrial genome evolution.

A suffix array construction algorithm, improved for use in genome analysis(left)
Mr.cell, An illustration for explaining localization(and our lab mascot)(right)

Recent publication

K. Imai, M.M. Gromiha & P. Horton “Mitochondrial b-Barrel Proteins, an Exlcusive Club?”, Cell 135:1158-9, 2008.
M. Frith, et al., “Discovering Sequence Motifs with Arbitrary Insertions and Deletions”, PloS Comput Biol, 4(5):e1000071, 2008.
P.Horton,“Position and sequence analysis of nucleosomes”, Farumashia, 44(4):352-3, 2008.
P. Horton et al., “WoLF PSORT: Protein Localization Predictor”, NAR, doi:10.1093/nar/gkm259, 2007.
P. Horton & W. Fujibuchi, “An upper Bound on the Hardness of Exact Matrix Based Motif Discovery”, Journal of Discrete Algorithms, 5:(4), 706-13, 2007.
K. Nakai & P. Horton, “Computational Prediction of Subcellular Localization”, Methods Mol Biol, 390:429-66, 2007.


The University of Tokyo
Graduate School of Frontier Sciences, The University of Tokyo

Page Top