Download

Available Datasets for Download

The VCF files is obtained from databases: ClinVar 1000GP DepMap GDC dbSNP
Please cite our work when using these datasets. Recommended compressed formats for "Frameshift Sequences" and "FS-control Sequences":

Protein Sequences RAR (92M) TAR.XZ (117M) TAR.GZ (487M)
CDS Sequences RAR (520M) TAR.XZ (327M) TAR.GZ (686M)
Group Link Species Counts Size Release
Frameshift-Protein PRO-FS-20260318 Homo sapiens 1,702,586 1.43G 2026.3.18
Wildtype-Protein PRO-FSC-20260318 Homo sapiens 45,009 28.46M 2026.3.18
Frameshift-CDS CDS-FS-20260318 Homo sapiens 1,702,586 3.71G 2026.3.18
FS-control-Sequences-CDS CDS-FSC-20260318 Homo sapiens 45,272 84.08M 2026.3.18