Computational Biology Group/Intra-University Cooperative LaboratoriesNakato Laboratory
(Laboratory of Biological Functions, IQB)

Data-driven genomic analysis for breakthrough discoveries

Data-driven genomic analysis for breakthrough discoveries "Genome" is all the genetic information passed down through generations. Genome-wide analysis using next-generation sequencing has become a central method in genomics and epigenomics fields. It can capture various genomic information, including gene transcription levels, protein-DNA binding, DNA methylation, genome replication, and the three-dimensional structure of the genome. With the rapid increase in available genomic and epigenomic data in recent years, there is a great demand for "data-driven large-scale NGS analysis" that can simultaneously analyze a large amount of NGS data to make breakthrough discoveries that challenge previous understanding. Our lab is developing a system for such large-scale multi-omics analysis and is attempting to systematically understand how genomic events such as transcription are coordinated and integrated across the genome.

Epigenome, 3D genome, Data-driven analysis, multi-omics, cohesin
Development of a robust computational platform for data-driven epigenome analysis

We aim to develop a computational platform for comparative epigenomic analysis of datasets obtained from multiple NGS assays (Figure). This platform will integrate ChIP-seq, RNA-seq, Hi-C and other types of genomic data (e.g., GWAS annotation) and implement semi-automated genome annotation to characterize the entire genomic region in more detail. We will also develop a data imputation strategy that predicts the enrichment pattern of ChIP-seq and other epigenomic data by leveraging information from other relevant cell lines and epigenomic marks available in existing databases. This approach can reduce technical noise in raw NGS data and generate virtual data of missing samples in a large dataset in silico. This platform can significantly reduce the cost of both NGS data generation and computational analysis for large-scale epigenomic analysis, especially for valuable biological samples (e.g., clinical samples of rare diseases).
We have developed a new method, HiC1Dmetrics, which can efficiently extract a variety of one-dimensional features from Hi-C data. This method facilitates epigenomic analysis that takes into account three-dimensional structure, which has been difficult with a traditional approach.

  • A comparative epigenomic analysis platform

  • Overview of HiC1Dmetrics

Functional analysis of cohesin complex

From the biological perspective, we are interested in the gene expression and three-dimensional chromatin folding regulated by the cohesin complex. Cohesin is involved in gene regulation by various mechanisms, including the mediation or insulation of enhancer-promoter loops, the formation of topologically associating domains (TADs) by loop extrusion, and RNA polymerase II elongation, cooperating with various factors such as cohesin loader, cohesin acetyltransferase, insulator-binding factor CTCF, and super-elongation complex (Figure). Although the mutation of cohesin and its related factor causes a developmental syndrome with a complex phenotype Cornelia de Lange syndrome (CdLS), and several cancers, an underlying molecular mechanism is still unclear. We are performing large-scale multi-omics analysis using Hi-C, RNA-seq, and ChIP-seq on human and mouse cells to comprehensively elucidate the details of the diverse functions of cohesin and their relationship to diseases caused by mutations in cohesin.

  • The proposed models for cohesin function

  • Comparative multi-omics analysis with the depletion of cohesin and related factors

Various analysis using single-cell methods

Single-cell analysis, which observes genomic information at the single-cell level, is used to observe cellular heterogeneity in tissues and cell differentiation trajectories. We have developed a computational platform for single-cell analysis, ShortCake. It can dramatically reduce the cost of the tool installation process, which is an initial barrier for many researchers. We are also promoting single-cell analysis through single-cell analysis workshops using ShortCake.
We are also interested in a gene network analysis using single-cell data, and developed a new approach EEISP, which robustly estimates gene co-expression and and mutual exclusivity from sparse scRNA-seq data. The network-level comparison allows us to identify novel candidate marker genes that could not be obtained by conventional gene expression variation analysis.

  • Overview of ShortCake, the single-cell analysis platform

  • Gene co-expression network estimation and network comparison from sparse scRNA-seq data

  • 1. Nakato R et al., Context-dependent perturbations in chromatin folding and the transcriptome by cohesin and related factors. Nature Communications, 2023.
  • 2. Wang J et al., Large-scale multi-omics analysis suggests specific roles for intragenic cohesin in transcriptional regulation. Nature Communications, 2022.
  • 3. Wang J, Nakato R. HiC1Dmetrics: framework to extract various one-dimensional features from chromosome structure data. Briefings in Bioinformatics, 2021.
  • 4. Nakajima et al., Codependency and mutual exclusivity for gene community detection from sparse single-cell transcriptome data. Nucleic Acids Research, 2021.
  • 5. Nakato R, Sakata T. Methods for ChIP-seq analysis: A practical workflow and advanced applications. Methods, 2020.
  • 6. Nakato R et al., Comprehensive epigenome characterization reveals diverse transcriptional regulation across human vascular endothelial cells, Epigenetics & Chromatin, 2019.

We welcome enthusiastic and motivated biology and computer science students interested in computational genomics using large datasets from multiple NGS assays (e.g., ChIP-seq, RNA-seq, Hi-C, and single-cell analysis).
The field of large-scale genome analysis is still growing, and you will encounter various difficulties and questions when analyzing real genomic data. We support students in our lab to cultivate the knowledge and experience to answer the above questions, enabling them to become the next generation of talent who can make fundamental discoveries from a mixed bag of data in the sea of data.
If you are interested in joining our lab, first visit our lab website ( You are also welcome to visit our lab if you are in the area. We are all looking forward to meeting you!

Page Top