Laboratories

Laboratory of Large-Scale Bioinformatics Recruiting students for the academic year 2018

Professor Masahiro KASAHARA
E- mail: mkasa{at}k.u-tokyo.ac.jp
Lab HP

Introduction

【Key Words】Genome Informatics, Genome Assembly, High-Performance Science, DIstributed Parallel Programming

 The International Human Genome Project started in 1990, spending 13 years and 3 billion US dollars to map the human genome of a single haploid (equivalent). Since then, the cost of DNA sequencing has been reduced by a factor of 3 million over the past two decades. In January 2014, Illumina announced a new DNA sequencer, HiSeq X Ten, that can decode the genomic sequence of one human genome at only about 1,000 US dollars. Imagine if we saw satellite images in Google Map via the Internet only 20 years after Christopher Columbus found America. We are seeing such a rapid technological advance in reality, being excited about what we will find from new technologies.
 Furthermore, in October 2016 Oxford Nanopore announced a surprisingly improved version of the USB flash memory shaped DNA sequencer that cost 1,000 US dollars. It is portable and easy to use, so even amateur researchers can buy and use it, although Illumina HiSeq X Ten requires a several million US dollars for upfront, which hindered the amateur use completely. Oxford Nanopore also says they plan to release SmidgION, a DNA sequencer that can be used and attached with smartphone. They claim a single sequencing run would cost only tens of dollars.
 Such rapid and drastic advancements in DNA sequencing technology affect not only to how we do research but also to our daily life. If I tried to add a camera to mobile phones in 1997, everyone might have thought I went crazy and I was doing a pointless thing, but no one would think so today. I believe that DNA sequencers will be used in a daily life in 2025 as smartphones with camera are today. What do you use the DNA sequence for in 2025? Most functional elements on genomes will be identified. The cause of most common heritable diseases will be identified. Probably we will have learned how cancers happen and evolve. Common cold will be classified into more specific diseases according to pathogens, which are easy to identify by DNA/RNA sequencing. We will not need broad-spectrum antibiotics anymore, being less worried about multidrug-resistant pathogens. We may even identify species in a sushi restaurant to reveal fish are illegally taken.
 However, such futures would not be realized without developing computational analysis methods for new and big data. A big reduction in DNA sequencing cost requires a big reduction in analysis cost by nature. We aim for developing analysis software for new measurement devices including emerging DNA sequencers.

New technologies, new algorithms

 We are developing algorithms and software for new technologies in molecular biology such as PacBio Sequel, Oxford Nanoopore MinION, 10X Chromium. More specifically, we are developing a variety of fundamental genome analysis software/algorithms for sequence alignment, genome assembly, genome comparison, graph genome analysis, construction of genetic maps, genotype caller, and so on.

Private cloud middleware for large-scale analysis of sequence data

 The processing speed of computers improves slower than the growth of data generated by DNA sequencers. We need to address it in part by parallel computation,but it increases programming costs significantly. Previously, researchers in High Performance Computing (HPC) made efforts to efficiently use computational resources. It was totally fine for researchers to spend a few years to write efficient code that utilizes transistors in CPU. However, for sequence analysis in which we see a new problem setting every three months due to rapid improvements  in technologies, it is pointless to optimizing code spending a few years.
 In order to better explain the situation, we propose a new term “High Performance Science (HPS)” in contrast to the traditional HPC. What we want to maximize is scientific knowledge we obtain, not the utilization of transistors in CPU. To this end, we are developing middleware (software that sits between users and the OS, allowing easy creation of applications) that will allow efficient utilization of parallel processing over huge datasets for rapid verification of hypothesis in life sciences.

Research at our lab

 We have collaboration projects with other laboratories so that students are excited about real experimental data (if they wish) from recent technologies. We welcome foreign students with programming skills and interests in biology.

Laboratories

The University of Tokyo
Graduate School of Frontier Sciences, The University of Tokyo

Page Top