bioseq.utils

bioseq.utils.fetchENS(uid: str) DNA[source]
bioseq.utils.fetchENS(uid: List[str]) List[DNA]

Fetch sequence corresponding to UID from Ensemble REST api.

Parameters

uid (str | List[str]) – One or list of ENS’s unique id

Returns

One or list of DNA sequence corresponding to UID

Return type

DNA | List[DNA]

bioseq.utils.fetchNCBI(uid: str) Union[DNA, RNA, Peptide][source]
bioseq.utils.fetchNCBI(uid: List[str]) List[Sequence]

Fetch sequence corresponding to UID from NCBI E-utilities. Only support RNA, mRNA(DNA), Protein.

Prefix

Explanation

NM_(mRNA)

Protein-coding transcripts (usually curated)

NR_(RNA )

Non-protein-coding transcripts

XM_(mRNA)

Predicted model protein-coding transcript

XR_(RNA )

Predicted model non-protein-coding transcript

AP_(Protein)

Annotated on AC alternate assembly

NP_(Protein)

Associated with an NM or NC accession

YP_(Protein)

Annotated on genomic molecules without an instantiated transcript record

XP_(Protein)

Predicted model, associated with an XM accession

WP_(Protein)

Non-redundant across multiple strains and species

Parameters

uid (str|List[str]) – One or list of NCBI’s unique id

Returns

If uid is a list, the return is a list of Sequence(excluded the uid not found data on NCBI) without ensure sequence’s type, else the return is a Sequence corresponding to UID.

Return type

DNA | RNA | Peptide | List[Sequence]

bioseq.utils.loadFasta(filename: str) List[Sequence][source]
bioseq.utils.loadFasta(filename: str, iterator: Literal[True]) Iterator[Sequence]

Load fasta file

Parameters
  • filename – the fasta file’s name.

  • iterator – Set to True as reading a large file, it will return a iterator.

Returns

List[Sequence] | Iterator

bioseq.utils.parseFasta(fasta_text: str) List[Sequence][source]

Parse a FASTA formatted string.

Parameters

fasta_text (str) – string to be parsed

Returns

Parsing result

Return type

List[Sequence]

bioseq.utils.printAlign(sequence1: str, sequence2: str, spacing: int = 10, line_width: int = 30, show_seq: bool = True)[source]

Print two sequence by a pretty format

Parameters
  • sequence1 (Iterator) –

  • sequence2 (Iterator) –

  • spacing (int) – A space each $spacing char

  • line_width (int) – the width of each line

  • show_seq (bool) – if False, only print the alignment result