Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
58
votes
11 answers

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have never had to deal directly with argparse before. How…
Niels
  • 1,513
  • 1
  • 14
  • 21
17
votes
14 answers

biopython no module named Bio

FYI: this is NOT a duplicate! Before running my python code I installed biopython in the cmd prompt: pip install biopython I then get an error saying 'No module named Bio' when try to import it in python import Bio The same thing happens…
Gabriel
  • 405
  • 1
  • 6
  • 16
15
votes
3 answers

SeqIO.parse on a fasta.gz

New to coding. New to Pytho/biopython; this is my first question online, ever. How do I open a compressed fasta.gz file to extract info and perform calcuations in my function. Here is a simplified example of what I'm trying to do (I've tried…
MelBel88
  • 165
  • 1
  • 1
  • 6
14
votes
11 answers

How do I convert the three letter amino acid codes to one letter code with python or R?

I have a fasta file as shown below. I would like to convert the three letter codes to one letter code. How can I do this with python or R? >2ppo ARGHISLEULEULYS >3oot METHISARGARGMET desired output >2ppo RHLLK >3oot MHRRM your suggestions would…
user1725152
  • 141
  • 1
  • 1
  • 4
13
votes
10 answers

Reverse complement of DNA strand using Python

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells…
user3783999
  • 571
  • 2
  • 7
  • 17
13
votes
3 answers

How to find a open reading frame in Python

I am using Python and a regular expression to find an ORF (open reading frame). Find a sub-string a string that is composed ONLY of the letters ATGC (no spaces or new lines) that: Starts with ATG, ends with TAG or TAA or TGA and should consider the…
Nodnin
  • 451
  • 2
  • 9
  • 21
10
votes
3 answers

Why can't python find some modules when I'm running CGI scripts from the web?

I have no idea what could be the problem here: I have some modules from Biopython which I can import easily when using the interactive prompt or executing python scripts via the command-line. The problem is, when I try and import the same biopython…
Dave
  • 2,396
  • 2
  • 22
  • 25
9
votes
2 answers

Biopython SeqIO to Pandas Dataframe

I have a FASTA file that can easily be parsed by SeqIO.parse. I am interested in extracting sequence ID's and sequence lengths. I used these lines to do it, but I feel it's waaaay too heavy (two iterations, conversions, etc.) from Bio import…
Sara
  • 933
  • 2
  • 10
  • 15
9
votes
1 answer

Traceback in Smith-Wateman algorithm with affine gap penalty

I'm trying to implement the Smith-Waterman algorithm for local sequence alignment using the affine gap penalty function. I think I understand how to initiate and compute the matrices required for calculating alignment scores, but am clueless as to…
jonwells
  • 213
  • 2
  • 7
8
votes
2 answers

Is there a function that can calculate a score for aligned sequences given the alignment parameters?

I try to score the already-aligned sequences. Let say seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE' seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE' with given parameters substitution matrix : blosum62 gap open penalty : -5 gap…
Jessada Thutkawkorapin
  • 1,336
  • 3
  • 16
  • 32
8
votes
1 answer

Is there a way with biopython to obtain the full abstract from a pubmed article?

I currently have the following code which queries pubmed: from Bio import Entrez Entrez.email = "kuharrw@hiram.edu" # Always tell NCBI who you are handle = Entrez.esearch(db="pubmed", term="bacteria") record = Entrez.read(handle) list =…
8
votes
4 answers

multiFASTA file processing

I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe automatically draw descriptive plots. Also an R…
Federico Giorgi
  • 10,495
  • 9
  • 42
  • 56
7
votes
1 answer

In python, how can I change the font size of leaf nodes when generating phylogenetic trees using Bio.Phylo.draw()?

I am using the Phylo package from Biopython to create phylogenetic trees. For big trees, I need to decrease the fontsize of the leaf nodes. It has been suggested to change matplotlib.pyplot.rcParams['font.size'] but this only allows me to change…
madcap
  • 163
  • 2
  • 7
7
votes
2 answers

Using Biopython (Python) to extract sequence from FASTA file

Ok so I need to extract part of a sequence from a FASTA file, using python (biopython, http://biopython.org/DIST/docs/tutorial/Tutorial.html) I need to get the first 10 bases from each sequence and put them in one file, preserving the sequence info…
user1784467
  • 455
  • 5
  • 9
  • 16
6
votes
3 answers

renumber residues in a protein structure file (pdb)

Hi I am currently involved in making a website aimed at combining all papillomavirus information in a single place. As part of the effort we are curating all known files on public servers (e.g. genbank) One of the issues I ran into was that many…
Stylize
  • 1,058
  • 5
  • 16
  • 32
1
2 3
89 90