bioseq

class bioseq.Sequence(seq: str, info: str = '')[source]

__init__(seq: str, info: str = '')[source]

Base class of all sequence type

Parameters

seq (str) – Sequence
info (str) – Some information string about the sequence

_print() → str[source]: Sequence’s output format, if length more than 30, only show the first and end 30 chars

align(subject: Union[str, Sequence], mode: int = 1) → Tuple[str, str, float][source]

Align two sequence. Use bioseq.config.AlignmentConfig to set the alignment score, including match(2), mismatch(-3), gap_open(-3), gap_extend(-3). number in brackets is default value

Parameters

subject (str|Sequence) – Sequence to align
mode (int) –
1: Use Needleman-Wunsch to global alignment

2: Use Smith-Waterman to partial alignment

Returns

query(str): Self sequence after alignment

subject(str): Subject sequence after alignment

score（int): align score if choose return_score

Return type

tuple

find(target: Union[str, Sequence]) → List[int][source]

Find the target sequence in this sequence and return the positions

Returns: All position of target in this sequence
Return type: List[int]

mutation(position: Union[str, int, List[int]], target: Union[str, Sequence]) → str[source]

Change this sequence, self.seq can only be modified by this function

Parameters

position (Union[str, int, List[int]) – char, index(s) to be mutation
target (Union[str, Sequence]) – the target char of mutation

Returns

Sequence after modified

Return type

str

toDNA() → DNA[source]: Convert to a DNA instance

toPeptide() → Peptide[source]: Convert to a Peptide instance

toRNA() → RNA[source]: Convert to an RNA instance

property composition: Dict[str, Union[int, float]]

Analysis the composition of sequence

Returns: Each element’s appearance times or percentage in sequence
Return type: Dict

property length: int: The length of sequence

property seq: str: Read-only property, sequence can only be modified by mutation()

property weight: float

Calculate the sequence’s Molar mass by below function

\[weight_{seq} = \sum_i^{length}weight_i - 18 * (length - 1)\]

Returns: Molar mass with unit of Dalton
Return type: weight(float)

class bioseq.Peptide(seq: str, info: str = '')[source]

_print() → str[source]: Peptide print starts with “N-” and then ends with “-C”, means sequence is from N-terminal to C-terminal

chargeInpH(pH: float) → float[source]

Calculate the charge amount of peptide at pH

\[ \begin{align}\begin{aligned}pH = pK_a + \lg{\frac{[A^-]}{[HA]}}\implies\frac{[HA]}{[A^-]} = 10^{pK_a - pH}\\charge = \frac{[A^-]}{[A]_{total}} = \frac{[A^-]}{[A^-] + [HA]} = \frac{1}{\frac{[A^-] + [HA]}{[A^-]}} = \frac{1}{1 + \frac{[HA]}{[A-]}}\end{aligned}\end{align} \]

for acidic residues: \(charge = 1 / (1 + 10 ^{pK_a - pH})\)

for basic residues: \(charge = 1 / (1 + 10^{pH - pK_a})\)

Parameters: pH (float) – pH value
Returns: charge in specific pH
Return type: float

getHphob(window_size: int = 9, show_img: bool = False) → List[float][source]

Calculate the Hydropathy Score.The lager the score, the higher the hydrophobicity. Each aa’s score is the average score of all aa in window_size. So part of Amino Acid at begin and end don’t have score

Parameters

window_size (int) – the number for calculate average hydropathy value
show_img (book) – whether to draw the result, require matplotlib

Returns

the result of peptide’s Hydropathy Score

Return type

List[float]

property pI: float: Calculate the peptide’s pI which is a pH make peptide’s charge equal zero

class bioseq.RNA(seq: str, info: str = '')[source]

_print()[source]: Peptide print starts with ” 5’- ” and then ends with ” -3’ “, means sequence is from 5’ to 3’

getOrf(topn: int = 1, replace: bool = False) → List[str][source]

Find the Open Reading Frame in sequence and save in self.orf

Parameters

topn (int) – the num of orfs, sorted by length of each orf, default is 1
replace (bool) – Replace origin sequence with the longest Orf

Returns

Orf found on self.

Return type

List[str]

transcript(topn: int = 1) → List[Peptide][source]

Transcript the sequence to peptide, the result will save in self.peptide

Parameters: topn (int) – filter num of transcripts, sorted by length of each transcript, default is 1
Returns: List of transcript product
Return type: List[Peptide]

property GC: float: Calculate the GC percentage

property complement: T: Return self complementary sequence

orf: List[str]: Can only visit after called get_orf()

peptide: List[Peptide]: Can only visit after called transcript()

property reversed: T: Return self reversed sequence

class bioseq.DNA(seq: str, info: str = '')[source]

getOrf(topn: int = 1, replace: bool = False) → List[str][source]

Return the open reading frame of mRNA which is translated from this sequence.

Parameters

topn (int) – the num of orfs, sorted by length of each orf, default is 1
replace (bool) – Replace origin sequence with the longest Orf

Returns

Orf found on mRNA

Return type

List[str]

transcript(topn: int = 1) → List[Peptide][source]

Return the transcript product of mRNA which is translated from this sequence.

Parameters: topn (int) – filter num of transcripts, sorted by length of each transcript, default is 1
Returns: List of transcript product
Return type: List[Peptide]

translate() → RNA[source]: Translate the sequence to RNA, which replace the T with U

property GC: float: Calculate the GC percentage

property complement: T: Return self complementary sequence

orf: List[str]: Can only visit after called get_orf()

peptide: List[Peptide]: Can only visit after called transcript()

property reversed: T: Return self reversed sequence