Class DnaCoder

java.lang.Object
org.snpeff.binseq.coder.Coder
org.snpeff.binseq.coder.DnaCoder
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
DnaQualityCoder

public class DnaCoder extends Coder
Class used to encode invalid input: '&' decode sequences into binary and vice-versa Note:This is a singleton class. It stores DNA bases into 2 bits {a,c,g,t} invalid input: '<'-> {0,1,2,3}
Author:
pcingola
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final int
     
    protected static final int
     
    int[]
     
    static boolean
     
    protected static final int
     
    static final long
     
    long[]
     
    protected static final long
     
    long[]
     
    long[]
     
    static final char[]
     

    Fields inherited from class org.snpeff.binseq.coder.Coder

    BITS_PER_LONGWORD, BYTES_PER_LONGWORD
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    How many bases can we pack in a word
    int
    baseToBits(char c)
    Encode a base using 2 bits
    int
    baseToBits(char c, boolean ignoreErrors)
     
    int
    How many bits do we need for each base
    void
    copyBases(long[] src, int srcStart, long[] dst, int dstStart, int length)
    Copy 'length' bases from 'src' (starting from 'srcStart') to 'dst' (starting from 'dstStart')
    void
    copyBases(long[] src, long[] dst, int start, int length)
    Copy 'length' bases from 'src' to 'dst' (starting from 'start')
    int
    decodeWord(long word, int pos)
    Decode bits from a given position
    long
    encodeWord(char base, int pos)
    Encode a base to a given position in a word
    static DnaCoder
    get()
     
    int
    Index of the last base coded in a word
    int
    length2words(int len)
    Calculate the coded length of a sequence in 'words' (depends on coder)
    long
    mask(int baseIndexInWord)
    Bitmask for a base in a word
    long
    replaceBase(long code, int pos, char newBase)
    Decode bits from a given position
    long
    reverseBases(long code)
    Reverse all bases in 'code'
    int
    score(long[] dst, long[] src, int srcStart, int length, int threshold)
    Calculate a 'score' for a sequence (dst) and a sub-sequence (src).
    char
    toBase(int code)
    Decode a base using 2 bits
    char
    toBase(long word, int pos)
    Decode a base from a given position in a word

    Methods inherited from class org.snpeff.binseq.coder.Coder

    qualityToBits, toQuality

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • debug

      public static boolean debug
    • BITS_PER_BASE

      protected static final int BITS_PER_BASE
      See Also:
    • MASK_FIRST_BASE

      protected static final long MASK_FIRST_BASE
      See Also:
    • BASES_PER_LONGWORD

      protected static final int BASES_PER_LONGWORD
      See Also:
    • LAST_BASE_IN_LONGWORD

      protected static final int LAST_BASE_IN_LONGWORD
      See Also:
    • MASK_ALL_WORD

      public static final long MASK_ALL_WORD
      See Also:
    • TO_BASE

      public static final char[] TO_BASE
    • MASK_BASE

      public long[] MASK_BASE
    • MASK_LOW

      public long[] MASK_LOW
    • MASK_HIGH

      public long[] MASK_HIGH
    • COUNT_DIFFS

      public int[] COUNT_DIFFS
  • Method Details

    • get

      public static DnaCoder get()
    • basesPerWord

      public int basesPerWord()
      Description copied from class: Coder
      How many bases can we pack in a word
      Specified by:
      basesPerWord in class Coder
      Returns:
    • baseToBits

      public int baseToBits(char c)
      Encode a base using 2 bits
      Specified by:
      baseToBits in class Coder
      Parameters:
      c -
      Returns:
    • baseToBits

      public int baseToBits(char c, boolean ignoreErrors)
    • bitsPerBase

      public int bitsPerBase()
      Description copied from class: Coder
      How many bits do we need for each base
      Specified by:
      bitsPerBase in class Coder
      Returns:
    • copyBases

      public void copyBases(long[] src, int srcStart, long[] dst, int dstStart, int length)
      Copy 'length' bases from 'src' (starting from 'srcStart') to 'dst' (starting from 'dstStart')
      Parameters:
      src -
      srcStart -
      dst -
      length -
    • copyBases

      public void copyBases(long[] src, long[] dst, int start, int length)
      Copy 'length' bases from 'src' to 'dst' (starting from 'start')
      Parameters:
      src -
      dst -
      start -
      length -
    • decodeWord

      public int decodeWord(long word, int pos)
      Decode bits from a given position
      Specified by:
      decodeWord in class Coder
      Parameters:
      word -
      pos -
      Returns:
    • encodeWord

      public long encodeWord(char base, int pos)
      Encode a base to a given position in a word
      Parameters:
      base -
      pos -
      Returns:
    • lastBaseinWord

      public int lastBaseinWord()
      Description copied from class: Coder
      Index of the last base coded in a word
      Specified by:
      lastBaseinWord in class Coder
      Returns:
    • length2words

      public int length2words(int len)
      Calculate the coded length of a sequence in 'words' (depends on coder)
      Parameters:
      len -
      Returns:
    • mask

      public long mask(int baseIndexInWord)
      Description copied from class: Coder
      Bitmask for a base in a word
      Specified by:
      mask in class Coder
      Returns:
    • replaceBase

      public long replaceBase(long code, int pos, char newBase)
      Decode bits from a given position
      Parameters:
      code -
      pos -
      Returns:
    • reverseBases

      public long reverseBases(long code)
      Reverse all bases in 'code'
      Parameters:
      linearIndex -
      Returns:
    • score

      public int score(long[] dst, long[] src, int srcStart, int length, int threshold)
      Calculate a 'score' for a sequence (dst) and a sub-sequence (src). The score is the number of equal bases (or zero if they differ)
      Parameters:
      dst - : Destination sequence codes[]
      src - : Source sequence codes[]
      srcStart - : Source sub-sequence start
      length - : Number of bases to compare
      threshold - : Number of bases allowed to differ
      Returns:
    • toBase

      public char toBase(int code)
      Decode a base using 2 bits
      Specified by:
      toBase in class Coder
      Returns:
    • toBase

      public char toBase(long word, int pos)
      Description copied from class: Coder
      Decode a base from a given position in a word
      Specified by:
      toBase in class Coder
      Parameters:
      word -
      pos -
      Returns: