How do I edit AND SAVE the sequence of a genbank file to a NEW genbank file using biopython?

Question

I have a .gbk file that's wrong, and I have the list of corrections that follows the format of

"Address of Nuclotide: correct nucleotide"

1:T
2:C
4:A
63:A
324:G
etc...

I know how to open and parse exact original sequence with

list(SeqIO.parse(sys.argv[1], "genbank"))[0].seq

I just need to know how to replace it with my own nucleotide corrections. I've tried

seq_records[0].seq = "".join(dna_refseq)

Where the dna_refseq is a was just a list that constitutes the entire genome

I literally cannot find this specific action anywhere in the documentation or online, and intuitively, this is something that biopython should be capable of.

Markus · Accepted Answer · 2016-04-07T20:41:43.270

2

You are assigning a string where a Bio.Seq object is expected. For me, this works:

from Bio import Seq
from Bio import SeqIO

my_entries = list(SeqIO.parse('my_file.gb', 'genbank'))
my_entry = my_entries[0]

# Make a new Seq object and assing to my_entry.seq. 'TTT' is just an example sequence
my_entry.seq = Seq.Seq('TTT', my_entry.seq.alphabet) 

# Write back to file
SeqIO.write(my_entries, 'my_updated_file.gb', 'genbank')

If your Genbank file has only one entry, you might consider using SeqIO.read:

my_entry = SeqIO.read('my_file.gb', 'genbank')

my_entry.seq = Seq.Seq('TTT', my_entry.seq.alphabet)
SeqIO.write(my_entry, 'my_updated_file.gb', 'genbank')

Alternatively, you can directly convert the sequence into a mutable sequence and manipulate it directly:

from Bio import SeqIO

my_entry = list(SeqIO.parse('my_file.gb', 'genbank'))[0]
my_entry.seq = my_entry.seq.tomutable()

my_entry.seq[0] = 'T'  # Remember that Genbank position 1 is 0 in seq
my_entry.seq[1] = 'C'
....
SeqIO.write(my_entry, 'my_updated_file.gb', 'genbank')

edited Apr 07 '16 at 20:41

answered Apr 07 '16 at 07:05

Markus

385
2
10

What's the 'TTT' for? – Tom Apr 07 '16 at 18:12
And how would I go back to saving it to a genbank? – Tom Apr 07 '16 at 18:36
@Tom:`'TTT'`is just an example for a sequence, in your case you would put your updated sequence here. Writing to a file is done with `SeqIO.write()`. I have edited my answer. – Markus Apr 07 '16 at 20:45
I think I ended up using seq_records[0].seq = Seq("".join(dna_refseq),IUPAC.unambiguous_dna). Seems like I was missing the cryptic IUPAC.ambiguous_dna portion. Apparently THAT is what enables you to forcefully make the atomic data structure to be saved as a seq class object. – Tom Apr 09 '16 at 01:23

How do I edit AND SAVE the sequence of a genbank file to a NEW genbank file using biopython?

1 Answers1