AI for Genomes: Rethinking de novo Assembly
Abstract
Accurately resolving genomic paths in assembly graphs is a key challenge in de novo genome assembly, especially in the presence of repeats that create tangles and fragmentation. We present a geometric deep learning framework that learns directly from graph structure, bypassing conventional heuristics and exploiting problem symmetries to achieve PacBio HiFi reconstructions with state-of-the-art quality and contiguity. The same approach can be implemented for other sequencing technologies. Here, we will present results for haploid and diploid genomes.
Our method performs robustly on both simulated and real datasets and will be able to utilise telomere-to-telomere reference expansion. By decoupling path inference from hard-coded strategies and generalising across species and genomic architectures, this framework opens the door to reconstructing highly complex genomes, including those with high ploidy or extensive structural variation.
Speaker Bio
Mile Šikić is the group leader at the Genome Institute of Singapore and a Professor of computer science at the University of Zagreb, Croatia. Throughout his scientific career, he has specialized in developing algorithms and AI methods for genomics. His laboratory has created several cutting-edge tools and models, including the HERRO error correction tool, the RiNALMo large RNA language model, the Racon consensus tool, the Raven de novo assembler, and the Edlib sequence aligner. Recently, the focus of his lab has shifted towards integrating AI into the de novo assembly process and innovating AI models to make RNA druggable.
In the initial decade of his career, Dr Šikić was engaged in various industry projects related to computer and mobile networks. He is an accomplished entrepreneur, having founded several ventures, including a hedge fund.