/ code & data
Software and data from the GenomeDataLab:
[ SOFTWARE ] Mutational signatures-related:
HyperClust by David Mas-Ponte. https://github.com/davidmasp/hyperclust
A statistical framework to detect clustered mutations in genomes, while accounting for mutation rate heterogenety and for estimated timing of the mutations.
associated with the publication Mas-Ponte & Supek (2020) Nature Genetics "DNA mismatch repair promotes APOBEC3-mediated diffuse hypermutation in human cancers"
CellLineMutSigs by Jurica Levatić. https://github.com/jlevatic/CellLineMutSigs
extracting mutational signatures in cancer cell line genomes, and association of mutational signatures with drug activity.
associated with the publication Levatić, Salvadores, Fuster & Supek (2022) Nature Comms "Mutational signatures are markers of drug sensitivity of cancer cells"
[ SOFTWARE ] Identifying driver mutations:
MutMatch by Elizaveta Besedina.
https://github.com/ebesedina/mutmatch
Cancer evolutionary model that was used to identify changes in selection upon copy number alterations (CNA) in cancer genomes
see publication Besedina & Supek (2024) Nature Comms. Copy number losses of oncogenes and gains of tumor suppressor genes generate common driver mutations.
uses WES or WGS data for comparative studies of differential selection on cancer genes.
DiffInvex by Ahmed Khalil.
https://github.com/AISKhalil/diffinvex
DiffInvex (Differential Introns-versus-Exons) is an evolutionary model to identify differential selection on point mutations in WGS data between two or more time points/conditions.
we applied it to comparing treated versus untreated cancer genomes to identify drivers that contribute to chemotherapy resistance: Khalil & Supek (2024) biorxiv.
[ SOFTWARE ] Machine learning tools:
FastRandomForest2 (beta) by Jordi Piqué Sellés.
https://github.com/GenomeDataScience/FastRandomForest
A re-implementation of the Random Forest classifier (RF) for the Weka machine learning environment, bringing massive speed and memory use improvements.
[ SOFTWARE ] Bioinformatics tools:
to be released soon: "Pipeline6" by Daniel Naro
T2T reference genome-compatible, extensible and scalable pipeline for cancer genomic data processing based on NextFlow
Includes the gamut of bioinformatics tools from the Hartwig Medical Foundation "Platinum pipeline" (Sage, GRIDSS, PURPLE...) and several additional tools:
Strelka2 (SNV, indel calling), Manta (SV calling), Paragraph (germline SV genotyping), Sequenza (CNA calling) GangSTR (repeat indel calling) etc.
“To invent, you need a good imagination and a pile of junk.” ― Thomas A. Edison