I'm having a problem with my code. I'm trying to iterate through the genbank file's list of genes using BioPython. Here's what it looks like:
class genBank:
gbProtId = str()
gbStart = int()
gbStop = int()
gbStrand = int()
genBankEntries = list()
for seq_record in SeqIO.parse(genBankFile, "genbank"):
for seq_feature in seq_record.features:
genBankEntry = genBank
if seq_feature.type == "CDS":
genBankEntry.gbProtId = seq_feature.qualifiers['protein_id']
genBankEntry.gbStart = seq_feature.location.start # prodigal GFF3 output is 1 based indexing
genBankEntry.gbStop = seq_feature.location.end
genBankEntry.gbStrand = seq_feature.strand
genBankEntries.append(genBankEntry)
It looks like it should work, but when I run it, the resulting structure genBankEntries
is just an enormous stack the size of the number of genes in the genbank file but with only the final value in seq_record.features as each list element:
00 = {type} <class '__main__.genBank'>
gbProtId = {list} ['BAA31840.1']
gbStart = {ExactPosition} 90649
gbStop = {ExactPosition} 91648
gbStrand = {int} 1
...
82 = {type} <class '__main__.genBank'>
gbProtId = {list} ['BAA31840.1']
gbStart = {ExactPosition} 90649
gbStop = {ExactPosition} 91648
gbStrand = {int} 1
This is especially confusing because both for-loops seem to work correctly:
for seq_record in SeqIO.parse(genBankFile, "genbank"):
for seq_feature in seq_record.features:
print(seq_feature)
Why is this?