SERtool User Guide

This tool identifies enriched regions in biological sequences. Users can flexibly define custom parameters to query regions of interest, and leverage precomputed mRNA and protein sequences derived from frameshift mutations to investigate potential links between these enriched regions and biological functions or disease mechanisms.

1. Run SERtool

Effective use of SERtool is based on three key components:

Pattern: The search pattern can be defined for either amino acid or nucleotide sequences, and takes one of three forms: (1) a fixed motif (PR / GR), (2) a degenerate motif using equivalent residues (D-E / PR-GR), or (3) a hybrid pattern combining fixed positions with wildcard symbols (H.R). This flexibility enables biologically informed queries that account for sequence conservation and variability at both protein and nucleic acid levels.
Organism: SERtool includes precomputed frameshifted proteome sequences for each supported organism, enabling direct scanning of potential alternative protein products resulting from translational frameshifts.
Parameters: Detection sensitivity is controlled by parameter pairs denoted as (hit, window), where a candidate region is considered significant if it contains at least hit occurrences of the query pattern within a sliding window of length window. Users may either specify individual (hit, window) points or define a range to automatically generate a series of such parameter combinations for comprehensive scanning.

1.1 Quick Seach

Set the query pattern, organism, and parameters. By default, a single amino acid uses (14, 20), meaning at least 14 occurrences in any 20-residue window define a target region.

1.2 Advanced Settings

Example scenarios are provided to illustrate potential applications. It should be emphasized that all analysis modes can be fully implemented within Fixed Mode; the purpose of alternative modes is solely to automate the evaluation of multiple (hit, window) parameter points, thereby facilitating a more comprehensive search.

Condition	Example	Start	End	Mode	Point	Note
Single Class Amino Acid Enrichment	`D-E`	20	100	Formula	`(18,20)(30,50)`	DDDDEEEEEE
Fixed Sequence Enrichment	`DE`	20	100	Formula	`(9,20)(15,50)`	DEDEDEDEDE
Wildcard Containing Enrichment	`H.K`	20	100	Formula	`(4,20)(8,50)`	HTKHSK
Continuous Sequence Enrichment	`R`	X	X	Fixed	`(9,9)`	RRRRRRRRR
Motif Sequence	`KDEL`	X	X	Fixed	`(1,4)`	KDEL
Score = 10	`E`	20	100	Score	Built-in defaults	EEEEEEEEEEDE...
Proportion = 0.7	`E`	20	100	Proportion	Built-in defaults	EEEDEDEDEE...

2. Frameshift Sequences Database

SERtool integrates variant data from ClinVar, DepMap, the 1000 Genomes Project, Genomic Data Commons (GDC), and the Single Nucleotide Polymorphism Database (dbSNP). It includes 1,355,621 computationally translated frameshift-altered protein sequences and 1,355,780 mRNA sequences, enabling precise identification of frameshift-induced sequence-enriched regions (SERs). These sequences can be accessed directly in the Organism section of the SERtool Advanced Settings page.Download the full dataset.

Frameshift Sequences Dataset : (1) Frameshift-Protein; (2) Widetype-Protein; (3) Frameshift-CDS; (4) Widetype-CDS.

Workflow for generating frameshift-altered sequences and primary database characterization
(A) Overview of data integration and processing pipeline. (B) Distribution of length differences between frameshift-mutated and wild-type protein sequences. (C) Proportion of frameshift variants classified as pathogenic/likely pathogenic versus variants of uncertain significance in ClinVar. (D) Top 10 diseases associated with frameshift variants based on ClinVar submission counts.

3. Code Availability and Cite

The open-source code for SERtool is publicly available on https://github.com/rennmeng/SERtool, and the tool is registered in https://bio.tools/sertool. Please cite the following when using this tool or dataset:

SERtool: An Integrated Framework for Systematic Analysis of Frameshift-Induced Remodeling of Sequence-Enriched Regions. Unpublished.
(Meng Ren, Yanfei Chen, Yun Liu, Fanchun Yang, Jing Shu, Xin Huang, Liangqian Huang*)