Download

Available Datasets for Download

The VCF files is obtained from databases: ClinVar 1000GP DepMap GDC dbSNP
Please cite our work when using these datasets. Recommended compressed formats for "Frameshift Sequences" and "Wildtype Sequences":

Protein Sequences RAR (52.01M) TAR.XZ (45.07M) TAR.GZ (77.15M)
CDS Sequences RAR (57.57M) TAR.XZ (51.59M) TAR.GZ (209.07M)
Group Species Counts Size Release
Frameshift-Protein Homo sapiens 1,355,628 1.04G 2026.3.30
Wildtype-Protein Homo sapiens 63,527 36.10M 2026.3.30
Frameshift-CDS Homo sapiens 1,355,876 2.77G 2026.3.30
Wildtype-CDS Homo sapiens 63,528 105.68M 2026.3.30