Laboratories

Computational Biology Group/Core LaboratoriesFrith Laboratory
(Laboratory of Large-Scale Bioinformatics)

Finding interesting information in genetic sequences

We look for interesting information in genetic sequences, and develop algorithmic and mathematical methods to do that. For example, we found animal DNA segments that have been conserved since the Precambrian ancestors of most animals: these segments control gene expression for embryonic development. This reveals a control system for animal development conserved since the common ancestors of humans and corals. In another project, we discovered the oldest ever "protein fossils", segments of formerly protein-coding DNA, by sensitive probability-based analysis. This revealed a great diversity of transposable elements in vertebrate ancestors of the Paleozoic Era. We also found the oldest ever virus fossils: relics of viral DNA inserted into host genomes. In addition, we collaborate with medical geneticists to understand complex chromosome rearrangements, tandem repeat expansions/contractions, and viral DNA insertions that cause disease. We discovered the cause of neuronal intranuclear inclusion disease: a tandem repeat expansion in a human-specific gene. Another project found significant non-existence of sequences in genomes and proteomes, providing clues about immune recognition and pathogen/host adaption. Finally, we developed a mathematically-optimal way to sample a subset of positions in a sequence, for fast analysis of big sequence data.

Research
keywords
genome, homology, evolution, probability, repeats
  • Animals in which each regulatory DNA element was detected. Regulatory segments are colored according to the gene they regulate: homeobox genes red, other transcription factors blue. Each animal’s phylum is written on the left, deuterostomes in purple and protostomes in green.

References/papers
  • DNA conserved in diverse animals since the Precambrian controls genes for embryonic development. Frith MC, Ni S. Mol Biol Evol. 2023 msad275.
  • Improved DNA-Versus-Protein Homology Search for Protein Fossils. Yao Y, Frith MC. IEEE/ACM Trans Comput Biol Bioinform. 2023 20(3):1691-1699.
  • How to optimally sample a sequence for rapid analysis. Frith MC, Shaw J, Spouge JL. Bioinformatics. 2023 39(2):btad057.
  • An immune-suppressing protein in human endogenous retroviruses. Zhang H, Ni S, Frith MC. Bioinform Adv. 2023 3(1):vbad013.
  • Finding Rearrangements in Nanopore DNA Reads with LAST and dnarrange. Frith MC, Mitsuhashi S. Methods Mol Biol. 2023 2632:161-175.
  • Paleozoic Protein Fossils Illuminate the Evolution of Vertebrate Genomes and Transposable Elements. Frith MC. Mol Biol Evol. 2022 39(4):msac068.
  • Significant non-existence of sequences in genomes and proteomes. Koulouras G, Frith MC. Nucleic Acids Res. 2021 49(6):3139-3155.
  • Minimally overlapping words for sequence similarity search. Frith MC, Noe L, Kucherov G. Bioinformatics. 2021 36(22-23):5344-5350.
  • A pipeline for complete characterization of complex germline rearrangements from long DNA reads. Mitsuhashi S, Ohori S, Katoh K, Frith MC, Matsumoto N. Genome Med. 2020 12(1):67.
  • How sequence alignment scores correspond to probability models. Frith MC. Bioinformatics. 2020 36(2):408-415.
  • Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease. Sone J, Mitsuhashi S, Fujita A, Mizuguchi T, Hamanaka K, Mori K, Koike H, Hashiguchi A, Takashima H, Sugiyama H, Kohno Y, Takiyama Y, Maeda K, Doi H, Koyano S, Takeuchi H, Kawamoto M, Kohara N, Ando T, Ieda T, Kita Y, Kokubun N, Tsuboi Y, Katoh K, Kino Y, Katsuno M, Iwasaki Y, Yoshida M, Tanaka F, Suzuki IK, Frith MC, Matsumoto N, Sobue G. Nat Genet. 2019 51(8):1215-1221.
  • A survey of localized sequence rearrangements in human DNA. Frith MC, Khan S. Nucleic Acids Res. 2018 46(4):1661-1673.
  • Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Hamada M, Ono Y, Asai K, Frith MC. Bioinformatics. 2017 33(6):926-928.
  • Split-alignment of genomes finds orthologies more accurately. Frith MC, Kawaguchi R. Genome Biol. 2015 16(1):106.
  • Adaptive seeds tame genomic sequence comparison. Kielbasa SM, Wan R, Sato K, Horton P, Frith MC. Genome Res. 2011 21(3):487-93.
  • A new repeat-masking method enables specific detection of homologous sequences. Frith MC. Nucleic Acids Res. 2011 39(4):e23.
Page Top