I have a Genbank
file containing a number of sequences. I have a second text file that contains the names of these sequences, as well as some other information about them, in a TSV, which I read in as a pandas dataframe. I used the .sample function to randomly select a name from this data, which i assigned the variable n_name
, as shown in the block of code below.
n = df_bp_pos_2.sample(n = 1)
n_value = n.iloc[:2]
n_name = n.iloc[:1]
n_name
is equal to the Locus name in the genbank
file and is case accurate. I am trying to parse through the genbank
file and extract the sequence that has locus = n_name
. The genbank
file is named all.gb
. I have:
from Bio import SeqIO
for seq_record in SeqIO.parse("all.gb", "genbank"):
But I am not too sure what the next line or 2 should be, to parse by locus? Any ideas?