bioseq

class bioseq.Sequence(seq: str, info: str = '')[source]
__init__(seq: str, info: str = '')[source]

Base class of all sequence type

Parameters
  • seq (str) – Sequence

  • info (str) – Some information string about the sequence

_print() str[source]

Sequence’s output format, if length more than 30, only show the first and end 30 chars

align(subject: Union[str, Sequence], mode: int = 1) Tuple[str, str, float][source]

Align two sequence. Use bioseq.config.AlignmentConfig to set the alignment score, including match(2), mismatch(-3), gap_open(-3), gap_extend(-3). number in brackets is default value

Parameters
  • subject (str|Sequence) – Sequence to align

  • mode (int) –

    1: Use Needleman-Wunsch to global alignment

    2: Use Smith-Waterman to partial alignment

Returns

query(str): Self sequence after alignment

subject(str): Subject sequence after alignment

score(int): align score if choose return_score

Return type

tuple

find(target: Union[str, Sequence]) List[int][source]

Find the target sequence in this sequence and return the positions

Returns

All position of target in this sequence

Return type

List[int]

mutation(position: Union[str, int, List[int]], target: Union[str, Sequence]) str[source]

Change this sequence, self.seq can only be modified by this function

Parameters
  • position (Union[str, int, List[int]) – char, index(s) to be mutation

  • target (Union[str, Sequence]) – the target char of mutation

Returns

Sequence after modified

Return type

str

toDNA() DNA[source]

Convert to a DNA instance

toPeptide() Peptide[source]

Convert to a Peptide instance

toRNA() RNA[source]

Convert to an RNA instance

property composition: Dict[str, Union[int, float]]

Analysis the composition of sequence

Returns

Each element’s appearance times or percentage in sequence

Return type

Dict

property length: int

The length of sequence

property seq: str

Read-only property, sequence can only be modified by mutation()

property weight: float

Calculate the sequence’s Molar mass by below function

\[weight_{seq} = \sum_i^{length}weight_i - 18 * (length - 1)\]
Returns

Molar mass with unit of Dalton

Return type

weight(float)

class bioseq.Peptide(seq: str, info: str = '')[source]
_print() str[source]

Peptide print starts with “N-” and then ends with “-C”, means sequence is from N-terminal to C-terminal

chargeInpH(pH: float) float[source]

Calculate the charge amount of peptide at pH

\[ \begin{align}\begin{aligned}pH = pK_a + \lg{\frac{[A^-]}{[HA]}}\implies\frac{[HA]}{[A^-]} = 10^{pK_a - pH}\\charge = \frac{[A^-]}{[A]_{total}} = \frac{[A^-]}{[A^-] + [HA]} = \frac{1}{\frac{[A^-] + [HA]}{[A^-]}} = \frac{1}{1 + \frac{[HA]}{[A-]}}\end{aligned}\end{align} \]

for acidic residues: \(charge = 1 / (1 + 10 ^{pK_a - pH})\)

for basic residues: \(charge = 1 / (1 + 10^{pH - pK_a})\)

Parameters

pH (float) – pH value

Returns

charge in specific pH

Return type

float

getHphob(window_size: int = 9, show_img: bool = False) List[float][source]

Calculate the Hydropathy Score.The lager the score, the higher the hydrophobicity. Each aa’s score is the average score of all aa in window_size. So part of Amino Acid at begin and end don’t have score

Parameters
  • window_size (int) – the number for calculate average hydropathy value

  • show_img (book) – whether to draw the result, require matplotlib

Returns

the result of peptide’s Hydropathy Score

Return type

List[float]

property pI: float

Calculate the peptide’s pI which is a pH make peptide’s charge equal zero

class bioseq.RNA(seq: str, info: str = '')[source]
_print()[source]

Peptide print starts with ” 5’- ” and then ends with ” -3’ “, means sequence is from 5’ to 3’

getOrf(topn: int = 1, replace: bool = False) List[str][source]

Find the Open Reading Frame in sequence and save in self.orf

Parameters
  • topn (int) – the num of orfs, sorted by length of each orf, default is 1

  • replace (bool) – Replace origin sequence with the longest Orf

Returns

Orf found on self.

Return type

List[str]

transcript(topn: int = 1) List[Peptide][source]

Transcript the sequence to peptide, the result will save in self.peptide

Parameters

topn (int) – filter num of transcripts, sorted by length of each transcript, default is 1

Returns

List of transcript product

Return type

List[Peptide]

property GC: float

Calculate the GC percentage

property complement: T

Return self complementary sequence

orf: List[str]

Can only visit after called get_orf()

peptide: List[Peptide]

Can only visit after called transcript()

property reversed: T

Return self reversed sequence

class bioseq.DNA(seq: str, info: str = '')[source]
getOrf(topn: int = 1, replace: bool = False) List[str][source]

Return the open reading frame of mRNA which is translated from this sequence.

Parameters
  • topn (int) – the num of orfs, sorted by length of each orf, default is 1

  • replace (bool) – Replace origin sequence with the longest Orf

Returns

Orf found on mRNA

Return type

List[str]

transcript(topn: int = 1) List[Peptide][source]

Return the transcript product of mRNA which is translated from this sequence.

Parameters

topn (int) – filter num of transcripts, sorted by length of each transcript, default is 1

Returns

List of transcript product

Return type

List[Peptide]

translate() RNA[source]

Translate the sequence to RNA, which replace the T with U

property GC: float

Calculate the GC percentage

property complement: T

Return self complementary sequence

orf: List[str]

Can only visit after called get_orf()

peptide: List[Peptide]

Can only visit after called transcript()

property reversed: T

Return self reversed sequence