Package org.snpeff.snpEffect
Class HgvsDna
java.lang.Object
org.snpeff.snpEffect.Hgvs
org.snpeff.snpEffect.HgvsDna
Coding DNA reference sequence
References http://www.hgvs.org/mutnomen/recs.html
Nucleotide numbering:
- there is no nucleotide 0
- nucleotide 1 is the A of the ATG-translation initiation codon
- the nucleotide 5' of the ATG-translation initiation codon is -1, the previous -2, etc.
- the nucleotide 3' of the translation stop codon is *1, the next *2, etc.
- intronic nucleotides (coding DNA reference sequence only)
- beginning of the intron; the number of the last nucleotide of the preceding exon, a plus sign and the position in the intron, like c.77+1G, c.77+2T, ....
- end of the intron; the number of the first nucleotide of the following exon, a minus sign and the position upstream in the intron, like ..., c.78-2A, c.78-1G.
- in the middle of the intron, numbering changes from "c.77+.." to "c.78-.."; for introns with an uneven number of nucleotides the central nucleotide is the last described with a "+" (see Discussion)
Genomic reference sequence
- nucleotide numbering starts with 1 at the first nucleotide of the sequence
NOTE: the sequence should include all nucleotides covering the sequence (gene) of interest and should start well 5' of the promoter of a gene
- no +, - or other signs are used
- when the complete genomic sequence is not known, a coding DNA reference sequence should be used
- for all descriptions the most 3' position possible is arbitrarily assigned to have been changed (see Exception)
-
Field Summary
FieldsFields inherited from class org.snpeff.snpEffect.Hgvs
duplication, genome, hgvsTrId, marker, MAX_SEQUENCE_LEN_HGVS, strandMinus, strandPlus, tr, variant, variantEffect
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected String
alt()
protected String
DNA level base changesprotected boolean
Is this a duplication?protected String
pos()
Genomic position for exonic variantsprotected String
pos
(int pos) HGVS position base on genomic coordinates (chr is assumed to be the same as in transcript/marker).protected String
posDownstream
(int pos) Position downstream of the transcriptprotected String
posExon
(int pos) Convert genomic position to HGVS compatible (DNA) positionprotected String
Intronic positionprotected String
posUpstream
(int pos) Position upstream of the transcript Note: How to calculate Upstream position: If strand is '-' as for NM_016176.3, "genomicTxStart" being the rightmost tx coord: cDotUpstream = -(cdsStart + variantPos - genomicTxStart) Instead of "-(variantPos - genomicCdsStart)": The method that stays in transcript space until extending beyond the transcript is correct because of these statements on http://varnomen.hgvs.org/bg-material/numbering/: * nucleotides upstream (5') of the ATG-translation initiation codon (start) are marked with a "-" (minus) and numbered c.-1, c.-2, c.-3, etc.protected String
posUtr3
(int pos) Position within 3'UTRprotected String
posUtr5
(int pos) Position within 5'UTRprotected String
Translocation nomenclature.protected String
ref()
toString()
protected String
Prefix for coding or non-coding sequencesMethods inherited from class org.snpeff.snpEffect.Hgvs
initStrand, parseTranscript, removeTranscript
-
Field Details
-
debug
public static boolean debug
-
-
Constructor Details
-
HgvsDna
-
-
Method Details
-
alt
-
dnaBaseChange
DNA level base changes -
isDuplication
protected boolean isDuplication()Is this a duplication? -
pos
Genomic position for exonic variants -
pos
HGVS position base on genomic coordinates (chr is assumed to be the same as in transcript/marker). -
posDownstream
Position downstream of the transcript -
posExon
Convert genomic position to HGVS compatible (DNA) position -
posIntron
Intronic position -
posUpstream
Position upstream of the transcript Note: How to calculate Upstream position: If strand is '-' as for NM_016176.3, "genomicTxStart" being the rightmost tx coord: cDotUpstream = -(cdsStart + variantPos - genomicTxStart) Instead of "-(variantPos - genomicCdsStart)": The method that stays in transcript space until extending beyond the transcript is correct because of these statements on http://varnomen.hgvs.org/bg-material/numbering/: * nucleotides upstream (5') of the ATG-translation initiation codon (start) are marked with a "-" (minus) and numbered c.-1, c.-2, c.-3, etc. (i.e. going further upstream) * Question: When the ATG translation initiation codon is in exon 2, and we find a variant in exon 1, should we include intron 1 (upstream of c.-14) in nucleotide numbering? (Isabelle Touitou, Montpellier, France) Answer: Nucleotides in introns 5' of the ATG translation initiation codon (i.e. in the 5'UTR) are numbered as introns in the protein coding sequence (see coding DNA numbering). In your example, based on a coding DNA reference sequence, the intron is present between nucleotides c.-15 and c.-14. The nucleotides for this intron are numbered as c.-15+1, c.-15+2, c.-15+3, ...., c.-14-3, c.-14-2, c.-14-1. Consequently, regarding the question, when a coding DNA reference sequence is used, the intronic nucleotides are not counted. -
posUtr3
Position within 3'UTR -
posUtr5
Position within 5'UTR -
prefixTranslocation
Translocation nomenclature. From HGVS: Translocations are described at the molecular level using the format "t(X;4)(p21.2;q34)", followed by the usual numbering, indicating the position translocation breakpoint. The sequences of the translocation breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession.version numbers should be given (see Discussion). E.g.: t(X;4)(p21.2;q35)(c.857+101_857+102) denotes a translocation breakpoint in the intron between coding DNA nucleotides 857+101 and 857+102, joining chromosome bands Xp21.2 and 4q34 -
ref
-
toString
-
typeOfReference
Prefix for coding or non-coding sequences
-