Published on: 3rd June 2025
A new tool designed for accurate detection of structural variations in clinical samples uses a machine learning algorithm to identify cancer-specific structural variations and copy number aberrations in long-read DNA sequencing data.
SAVANA offers rapid and reliable genomic analysis to better analyse clinical samples, thereby informing cancer diagnosis and therapeutic interventions.
The complex structure of cancer genomes means that standard analysis tools give false-positive results, leading to erroneous clinical interpretations of tumour biology. SAVANA significantly reduces such errors.
To address this challenge, researchers developed SAVANA, a new algorithm, which they recently described in the journal Nature Methods.
This algorithm was developed and tested across 99 human tumour samples by researchers at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the R&D laboratory of Genomics England, in collaboration with clinical partners at University College London (UCL), the Royal National Orthopaedic Hospital (RNOH), Instituto de Medicina Molecular João Lobo Antunes, and Boston Children’s Hospital.
“Because other analysis tools are not developed to account for the particularities of cancer genomics data, they often pick up false positives that could lead to incorrect clinical and biological interpretations,” said Isidro Cortes-Ciriano, Group Leader at EMBL-EBI. “SAVANA changes this. By training the algorithm directly on long-read sequencing data from cancer samples, we created a new method that can tell the difference between true cancer-related genomic alterations and sequencing artefacts, thereby enabling us to elucidate the mutational processes underlying cancer using long-read sequencing with unprecedented resolution.”
Optimised for clinical use
“When we developed SAVANA, our focus was clear: create a tool sophisticated enough to characterise complex cancer genomes but practical enough for clinical use,” explained Hillary Elrick, former Predoctoral Fellow at EMBL-EBI and Postdoctoral Fellow at the Francis Crick Institute.
“As a result, SAVANA can accurately distinguish somatic structural variants, copy number aberrations, tumour purity, and ploidy – all key to understanding tumour biology and guiding clinical treatment decisions,” added Carolin Sauer, Postdoctoral Fellow at EMBL-EBI.
Its rapid analysis and robust error correction make SAVANA well suited for clinical use. The method was recently applied to study osteosarcoma, a rare and aggressive bone cancer that mostly affects young people, where it helped researchers uncover new genomic rearrangements, providing novel insights into how osteosarcoma evolves and progresses. The team also compared SAVANA’s results from long-read data with Illumina sequencing of the same samples analysed using a whole-genome sequencing data analysis pipeline used to deliver clinical reports. The findings were highly consistent across technologies, demonstrating that SAVANA performs on par with current clinical standards while revealing additional cancer-relevant alterations.
“The capability to accurately detect structural variants is transformative for clinical diagnostics,” said Adrienne Flanagan, Professor at UCL, Consultant Histopathologist at RNOH. “SAVANA could help us confidently identify genomic alterations relevant for diagnosis and prognosis. Ultimately, this means we would be better placed to deliver personalised treatments for cancer patients.”