1

I have a problem downloading a full record from Nucleotide db. I use:

from Bio import Entrez
from Bio import SeqIO

with Entrez.efetch(db="nuccore", rettype="gb", retmode="full", id="NC_007384") as handle:
    seq_record = SeqIO.read(handle, "gb") 

print(seq_record)

which gives me a short version of gb file so the command:

seq_record.features

does not return features.

In comparison, there is no problem when I do the same thing with GenBank ID:

with Entrez.efetch(db="nuccore", rettype="gb", retmode="full", id="CP014768.1") as handle:
    seq_record = SeqIO.read(handle, "gb") 

print(seq_record)

After that I can extract every annotated feature from the list seq_record.features.

Is there a way to download full RefSeq records using Efetch?

Some student
  • 131
  • 2
  • 13

1 Answers1

2

You need to either use style="withparts" or change rettype to gbwithparts to fetch all of the features. This table has some information.

>>> from Bio import Entrez
>>> from Bio import SeqIO
>>> Entrez.email = 'someone@email.com'
>>> with Entrez.efetch(db="nuccore", rettype="gb", retmode="full", id="NC_007384") as handle:
...     seq_record = SeqIO.read(handle, "gb") 
... 
>>> len(seq_record.features)
1
>>> with Entrez.efetch(db="nuccore", rettype="gbwithparts", retmode="full", id="NC_007384") as handle:
...     seq_record = SeqIO.read(handle, "gb") 
... 
>>> len(seq_record.features)
10616
>>> with Entrez.efetch(db="nuccore", rettype="gb", style="withparts", retmode="full", id="NC_007384") as handle:
...     seq_record = SeqIO.read(handle, "gb")
... 
>>> len(seq_record.features)
10616
vkkodali
  • 630
  • 7
  • 18
  • 1
    This information is extremely useful to me, though I can't find options for the "style" keyword documented anywere either on the BioPythonn or efetch documentations, do you have a source for that? – Dan Jun 19 '20 at 18:44
  • Additionally: retmode='full' as this didn't appear to be an option for nuccore on the table you linked. – Dan Jun 19 '20 at 18:51