Package org.snpeff.snpEffect.factory
Class SnpEffPredictorFactory
java.lang.Object
org.snpeff.snpEffect.factory.SnpEffPredictorFactory
- Direct Known Subclasses:
SnpEffPredictorFactoryFeatures
,SnpEffPredictorFactoryGenesFile
,SnpEffPredictorFactoryGff
,SnpEffPredictorFactoryKnownGene
,SnpEffPredictorFactoryRefSeq
This class creates a SnpEffectPredictor from a file (or a set of files) and a configuration
- Author:
- pcingola
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
protected void
add
(Chromosome chromo) protected Exon
Add an exonprotected void
Add a Geneprotected void
Add a generic Markerprotected void
add
(Transcript tr) Add a transcriptprotected void
Add a marker to the collectionprotected void
addSequences
(String chr, String chrSeq) Add genomic reference sequencesprotected void
Adjust chromosome length using gene information This is used when the sequence is not available (which makes sense on test-cases and debugging only)protected void
Adjust transcripts: recalculate start, end, strand, etc.protected void
Perform some actions before reading sequencesprotected void
Only coding transcripts have CDS: Make sure that transcripts having CDS are protein coding It might not be always "precise" though: $ grep CDS genes.gtf | cut -f 2 | ~/snpEff/scripts/uniqCount.pl 113 IG_C_gene 64 IG_D_gene 24 IG_J_gene 366 IG_V_gene 21 TR_C_gene 3 TR_D_gene 82 TR_J_gene 296 TR_V_gene 461 non_stop_decay 63322 nonsense_mediated_decay 905 polymorphic_pseudogene 34 processed_transcript 1340112 protein_codingprotected void
Collapse exons having zero size introns between themabstract SnpEffectPredictor
create()
protected void
Create random sequences for exons Note: This is only used for test cases!protected void
Consolidate transcripts: If two exons are one right next to the other, join them E.g.protected void
Create exons from CDS infoprotected void
Create exons from CDS info WARNING: We might end up with redundant exons if some exons existed before this processprotected Gene
protected Gene
protected Marker
findMarker
(String id) protected Transcript
findTranscript
(String id) protected Transcript
findTranscript
(String trId, String id) protected Chromosome
getOrCreateChromosome
(String chromoName) Get a chromosome.protected int
parsePosition
(String posStr) Parse a string as a 'position'.protected void
Read exon sequences from a FASTA fileprotected void
replaceTranscript
(Transcript trOld, Transcript trNew) void
setCircularCorrectLargeGap
(boolean circularCorrectLargeGap) void
setCreateRandSequences
(boolean createRandSequences) void
setDebug
(boolean debug) void
setFastaFile
(String fastaFile) void
setFileName
(String fileName) void
void
setReadSequences
(boolean readSequences) Read sequences? Note: This is only used for debugging and testingvoid
setStoreSequences
(boolean storeSequences) void
setVerbose
(boolean verbose) protected String
Shw differences in chromosome names
-
Field Details
-
MARK
public static final int MARK- See Also:
-
MIN_TOTAL_FRAME_COUNT
public static int MIN_TOTAL_FRAME_COUNT
-
-
Constructor Details
-
SnpEffPredictorFactory
-
-
Method Details
-
add
-
add
-
add
Add an exon- Parameters:
exon
-- Returns:
- exon added. Note: If the exon exists with the same ID, return old exon. If exon exists with same ID and same coordiates, add a new exon with different ID.
-
add
Add a Gene -
add
Add a generic Marker -
add
Add a transcript -
addMarker
Add a marker to the collection -
addSequences
Add genomic reference sequences -
adjustChromosomes
protected void adjustChromosomes()Adjust chromosome length using gene information This is used when the sequence is not available (which makes sense on test-cases and debugging only) -
adjustTranscripts
protected void adjustTranscripts()Adjust transcripts: recalculate start, end, strand, etc. -
beforeExonSequences
protected void beforeExonSequences()Perform some actions before reading sequences -
codingFromCds
protected void codingFromCds()Only coding transcripts have CDS: Make sure that transcripts having CDS are protein coding It might not be always "precise" though: $ grep CDS genes.gtf | cut -f 2 | ~/snpEff/scripts/uniqCount.pl 113 IG_C_gene 64 IG_D_gene 24 IG_J_gene 366 IG_V_gene 21 TR_C_gene 3 TR_D_gene 82 TR_J_gene 296 TR_V_gene 461 non_stop_decay 63322 nonsense_mediated_decay 905 polymorphic_pseudogene 34 processed_transcript 1340112 protein_coding -
collapseZeroLenIntrons
protected void collapseZeroLenIntrons()Collapse exons having zero size introns between them -
create
-
createRandSequences
protected void createRandSequences()Create random sequences for exons Note: This is only used for test cases! -
deleteRedundant
protected void deleteRedundant()Consolidate transcripts: If two exons are one right next to the other, join them E.g. exon1:1234-2345, exon2:2346-2400 => exon:1234-2400 This happens mostly in GTF files, where the stop-codon is specified separated from the exon info. -
exonsFromCds
protected void exonsFromCds()Create exons from CDS info -
exonsFromCds
Create exons from CDS info WARNING: We might end up with redundant exons if some exons existed before this process- Parameters:
tr
- : Transcript with CDS info, but no exons
-
findGene
-
findGene
-
findMarker
-
findTranscript
-
findTranscript
-
getOrCreateChromosome
Get a chromosome. If it doesn't exist, create it -
getProteinByTrId
-
parsePosition
Parse a string as a 'position'. Note: It subtracts 'inOffset' so that all coordinates are zero-based -
readExonSequences
protected void readExonSequences()Read exon sequences from a FASTA file -
replaceTranscript
-
setCircularCorrectLargeGap
public void setCircularCorrectLargeGap(boolean circularCorrectLargeGap) -
setCreateRandSequences
public void setCreateRandSequences(boolean createRandSequences) -
setDebug
public void setDebug(boolean debug) -
setFastaFile
-
setFileName
-
setRandom
-
setReadSequences
public void setReadSequences(boolean readSequences) Read sequences? Note: This is only used for debugging and testing -
setStoreSequences
public void setStoreSequences(boolean storeSequences) -
setVerbose
public void setVerbose(boolean verbose) -
showChromoNamesDifferences
Shw differences in chromosome names
-