Important: The SpliceAI variant impact data on the UCSC Genome Browser is directly from
Illumina (See Data Access below). However, since SpliceAI refers to the
algorithm, and not the computed dataset, the data on the Broad server or other sources may have
some differences between them.
Description
SpliceAI is an open-source deep
learning algorithm that predicts splicing probability for nucleotides and
as a result can score DNA variants for splicing impact.
Such variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms.
SpliceAI was developed at Illumina; a
lookup tool
is provided by the Broad institute.
The spliceAI algorithm is run on the genome sequence itself and scores each
nucleotide for the probability that it is a donor or acceptor site, on both the
forward and the reverse strand. Then variants are added and the new sequence is
scored again. The "wildtype" container track shows the scores for the genome
sequence itself and the "variants" container track shows the impact of all
possible variants close to known splice sites. The "wildtype" subtracks are
useful when looking at new transcript models, to evaluate how likely exon
boundaries are. The "variants" subtracks are used to evaluate the impact of
variants onto splicing, typically in medical diagnostics.
Why are some variants not scored by SpliceAI?
SpliceAI only annotates variants close to splice sites of genes defined by the
Gencode gene annotation track. Additionally, SpliceAI does not annotate variants if they are
close to chromosome ends (5kb on either side), deletions of length greater than
twice the input parameter -D, or inconsistent with the reference fasta file.
What are the differeneces between masked and unmasked tracks?
The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites
and weakening unannotated splice sites, which are typically much less pathogenic than weakening
annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing
changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative
splicing analysis and masked tracks for variant interpretation.
Display Conventions and Interpretation
Variants are colored according to Walker et al. 2023 splicing imact:
- Predicted impact on splicing: Score >= 0.2
- Not informative: Score < 0.2 and > 0.1
- No impact on splicing: Score <= 0.1
Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor
gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up
a table with this information.
The scores range from 0 to 1 and can be interpreted as the
probability of the variant being splice-altering. In the paper, a detailed characterization is
provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs.
Methods
The data were downloaded from Illumina.
The spliceAI scores are represented in the VCF INFO field as
SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31
Here, the pipe-separated fields contain
- ALT allele
- Gene name
- Acceptor gain score
- Acceptor loss score
- Donor gain score
- Donor loss score
- Relative location of affected cryptic acceptor
- Relative location of affected acceptor
- Relative location of affected cryptic donor
- Relative location of affected donor
Since most of the values are 0 or almost 0, we selected only those variants
with a score equal to or greater than 0.02.
The complete processing of this track can be found in the
makedoc.
Data Access
These data are not available for download from the Genome Browser.
The raw data can be found directly on
Illumina.
See below for a copy of the license restrictions pertaining to these data.
License
FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are
made available by Illumina only for academic or not-for-profit research only.
By accessing the SpliceAI data, you acknowledge and agree that you may only
use this data for your own personal academic or not-for-profit research only,
and not for any other purposes. You may not use this data for any for-profit,
clinical, or other commercial purpose without obtaining a commercial license
from Illumina, Inc.
Credits
Thanks to Illumina for making the data available. Thanks to Michael Hiller, Francois Lecoquierre and
Jean-Madeleine de Sainte Agathe for making available and suggesting the SpliceAI wildtype tracks.
References
Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA,
Arbelaez J, Cui W, Schwartz GB et al.
Predicting Splicing from Primary Sequence with Deep Learning.
Cell. 2019 Jan 24;176(3):535-548.e24.
PMID: 30661751
Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A,
Tchourbanov A et al.
Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on
splicing: Recommendations from the ClinGen SVI Splicing Subgroup.
Am J Hum Genet. 2023 Jul 6;110(7):1046-1067.
PMID: 37352859; PMC: PMC10357475
|
|