
【Key Words】DNA, Omics, Single cell analysis, algorithm, parallel computation, Machine Learning
Such variations enhance gene functionality and bring about diversity in organisms,but are also factors for disease. It is therefore vital to investigate the details of the nature of immutable and invariant DNA. Collecting and analyzing massive amounts of DNA data is a promising approach to understand these fundamental questions, demanding a computationally efficient method of analyzing numerous data rapidly with a high accuracy
We have studied chromosome evolution in vertebrates and insects over the past 600 million to 1 billion years, comparing the DNA of humans, chickens,killifish,and puffer fish. We confirmed the Ohno’s conjecture (1970)on the two rounds of whole genome duplications in early vertebrates (Fig. 1).
Fig . 1:Chromosomal evolution in vertebrates
(Source in:Nature, 447:714-719, 2007, Genome Research, 17(9):1254-1265, 2007)
On scales of decades to millions of years,we see relatively small genetic variations,such as substitutions, insertions, and deletions. These contribute to phenotype differentiation, and in turn to issues such as genetic diseases. Decoding individual genomes allows us to detect these changes; however,reading the approximately 3 billion base pairs that constitute human DNA is no trivial task. We have developed a system using ultra high-speed sequencers and parallel processing computers to analyze variations in one person’s DNA in about a day on average. Aided by this technology,we are working with the University of Tokyo Hospital to search for genes and genomic regions that are specific to ethnic Japanese, developing a DNA reference of a typical Japanese. We are also looking for genetic changes linked to brain and other diseases.
DNA is wrapped around histone octamers,bundled tightly enough to fit within a 10μm nucleus,though human DNA is nearly 2m long if stretched out. This leads to some fundamental
questions:How is it that such a three dimensional structure doesn’t become tangled when it is copied and distributed into two nuclei? If two genes that work in concert are encoded far apart,does this imply that they come into proximity after DNA folding?
One interesting phenomenon that we have reported on with regard to chromatin structure is ~200bp periodicity of genetic variations such as single nucleotide variations and short indels downstream of transcription initiation sites and its association with nucleosome positioning (Fig. 2). In vertebrates, the methylated C in CpG is highly mutated into a T. We have observed that mutations are more likely to arise in areas around methylated Cs (Fig. 3)―further proof that the mechanism inducing changes in DNA is shrouded in mysteries.
Fig . 2:Chromatin-correlated g enetic variation
(Source: Science, 323(5912):401-404, 2009)
Fig . 3:More mutations are seen around methylated CpGs
(Source: Genome Research, 22(8):1419-1425, 2012
Once the DNA of a species is sequenced, it becomes possible to modify DNA so that a given gene is knocked out or forcibly expressed. Disrupting essential genes is generally lethal, but knocking out non-essential genes results in slight changes to phenotype. Analysis of fluorescent microscopic images of disruptants of non-essential genes makes it possible to treat morphological variations as quantitative traits. Our image analysis server is available through WWW (Fig. 4).
Fig . 4 : Raw images of budding yeast (left) and processed images (right)
(Source: PNAS, 102(52):19015-20, 2005)