bioseq
- class bioseq.Sequence(seq: str, info: str = '')[source]
- __init__(seq: str, info: str = '')[source]
Base class of all sequence type
- Parameters
seq (str) – Sequence
info (str) – Some information string about the sequence
- _print() str [source]
Sequence’s output format, if length more than 30, only show the first and end 30 chars
- align(subject: Union[str, Sequence], mode: int = 1) Tuple[str, str, float] [source]
Align two sequence. Use
bioseq.config.AlignmentConfig
to set the alignment score, including match(2), mismatch(-3), gap_open(-3), gap_extend(-3). number in brackets is default value- Parameters
subject (str|Sequence) – Sequence to align
mode (int) –
1: Use Needleman-Wunsch to global alignment
2: Use Smith-Waterman to partial alignment
- Returns
query(str): Self sequence after alignment
subject(str): Subject sequence after alignment
score(int): align score if choose return_score
- Return type
tuple
- find(target: Union[str, Sequence]) List[int] [source]
Find the target sequence in this sequence and return the positions
- Returns
All position of target in this sequence
- Return type
List[int]
- mutation(position: Union[str, int, List[int]], target: Union[str, Sequence]) str [source]
Change this sequence,
self.seq
can only be modified by this function- Parameters
position (Union[str, int, List[int]) – char, index(s) to be mutation
target (Union[str, Sequence]) – the target char of mutation
- Returns
Sequence after modified
- Return type
str
- property composition: Dict[str, Union[int, float]]
Analysis the composition of sequence
- Returns
Each element’s appearance times or percentage in sequence
- Return type
Dict
- property length: int
The length of sequence
- property seq: str
Read-only property, sequence can only be modified by
mutation()
- property weight: float
Calculate the sequence’s Molar mass by below function
\[weight_{seq} = \sum_i^{length}weight_i - 18 * (length - 1)\]- Returns
Molar mass with unit of Dalton
- Return type
weight(float)
- class bioseq.Peptide(seq: str, info: str = '')[source]
- _print() str [source]
Peptide print starts with “N-” and then ends with “-C”, means sequence is from N-terminal to C-terminal
- chargeInpH(pH: float) float [source]
Calculate the charge amount of peptide at pH
\[ \begin{align}\begin{aligned}pH = pK_a + \lg{\frac{[A^-]}{[HA]}}\implies\frac{[HA]}{[A^-]} = 10^{pK_a - pH}\\charge = \frac{[A^-]}{[A]_{total}} = \frac{[A^-]}{[A^-] + [HA]} = \frac{1}{\frac{[A^-] + [HA]}{[A^-]}} = \frac{1}{1 + \frac{[HA]}{[A-]}}\end{aligned}\end{align} \]for acidic residues: \(charge = 1 / (1 + 10 ^{pK_a - pH})\)
for basic residues: \(charge = 1 / (1 + 10^{pH - pK_a})\)
- Parameters
pH (float) – pH value
- Returns
charge in specific pH
- Return type
float
- getHphob(window_size: int = 9, show_img: bool = False) List[float] [source]
Calculate the Hydropathy Score.The lager the score, the higher the hydrophobicity. Each aa’s score is the average score of all aa in window_size. So part of Amino Acid at begin and end don’t have score
- Parameters
window_size (int) – the number for calculate average hydropathy value
show_img (book) – whether to draw the result, require
matplotlib
- Returns
the result of peptide’s Hydropathy Score
- Return type
List[float]
- property pI: float
Calculate the peptide’s pI which is a pH make peptide’s charge equal zero
- class bioseq.RNA(seq: str, info: str = '')[source]
- _print()[source]
Peptide print starts with ” 5’- ” and then ends with ” -3’ “, means sequence is from 5’ to 3’
- getOrf(topn: int = 1, replace: bool = False) List[str] [source]
Find the Open Reading Frame in sequence and save in
self.orf
- Parameters
topn (int) – the num of orfs, sorted by length of each orf, default is 1
replace (bool) – Replace origin sequence with the longest Orf
- Returns
Orf found on self.
- Return type
List[str]
- transcript(topn: int = 1) List[Peptide] [source]
Transcript the sequence to peptide, the result will save in
self.peptide
- Parameters
topn (int) – filter num of transcripts, sorted by length of each transcript, default is 1
- Returns
List of transcript product
- Return type
List[Peptide]
- property GC: float
Calculate the GC percentage
- property complement: T
Return self complementary sequence
- orf: List[str]
Can only visit after called get_orf()
- property reversed: T
Return self reversed sequence
- class bioseq.DNA(seq: str, info: str = '')[source]
- getOrf(topn: int = 1, replace: bool = False) List[str] [source]
Return the open reading frame of mRNA which is translated from this sequence.
- Parameters
topn (int) – the num of orfs, sorted by length of each orf, default is 1
replace (bool) – Replace origin sequence with the longest Orf
- Returns
Orf found on mRNA
- Return type
List[str]
- transcript(topn: int = 1) List[Peptide] [source]
Return the transcript product of mRNA which is translated from this sequence.
- Parameters
topn (int) – filter num of transcripts, sorted by length of each transcript, default is 1
- Returns
List of transcript product
- Return type
List[Peptide]
- property GC: float
Calculate the GC percentage
- property complement: T
Return self complementary sequence
- orf: List[str]
Can only visit after called get_orf()
- property reversed: T
Return self reversed sequence