6

I am trying to use Entrez to import publication data into a database. The search part works fine, but when I try to parse:

from Bio import Entrez

def create_publication(pmid):

    handle = Entrez.efetch("pubmed", id=pmid, retmode="xml")
    records = Entrez.parse(handle)
    item_data = records.next()
    handle.close()

... I get the following error:

File "/venv/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 296, in parse raise ValueError("The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse") ValueError: The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse

This code used to work until a few days ago. Any ideas what might be going wrong here?

Also, looking at the source code (http://biopython.org/DIST/docs/api/Bio.Entrez-pysrc.html) and trying to follow the listed example, gives the same error:

from Bio import Entrez 
Entrez.email = "Your.Name.Here@example.org"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
records = Entrez.parse(handle) 
for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()
apiljic
  • 527
  • 4
  • 14
  • 1
    Silly question, but have you tried using `Entrez.read()`, and then parsing the results? – MattDMo Dec 22 '16 at 17:42
  • read() works principally, but there is a whole bunch of other code around this. So when I try, I just keep getting different errors. So either there is a simple fix for parse(), or I need to rewrite the rest. – apiljic Dec 22 '16 at 17:45
  • This used to work until three days ago, but it seems something changed at PubMed recently, so it fails now. – apiljic Dec 22 '16 at 17:46
  • 1
    If it makes you feel any better, I'm getting the same error with the second bit of code you posted. `records` is a generator object, but I can't seem to read it, so I'm not sure what it contains... – MattDMo Dec 22 '16 at 17:56
  • Yep, same here. I'll go for read() then. But maybe I try to get in touch with people at NIH who run PubMed. If they made the change deliberately, then it is fine. But it could also be a bug they are not aware of. – apiljic Dec 23 '16 at 00:10
  • 4
    Looks like [biopython devs are aware](https://github.com/biopython/biopython/issues/1027) – Kevin Dec 23 '16 at 14:52
  • Thanks Kevin for sharing the link! – apiljic Dec 26 '16 at 14:52
  • 1
    @apiljic the [GitHub Issue](https://github.com/biopython/biopython/issues/1027) is now closed, just FYI – nbryans Feb 02 '17 at 23:06

1 Answers1

3

The issue, as documented in other comments and the GitHub Issue, is caused by a deliberate change made by NCBI Entrez Utilities Developers. As documented in this issue by Jhird , you can change your code to the following:

from Bio import Entrez 
Entrez.email = "Your.Name.Here@example.org"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")  

records = Entrez.read(handle)      # Difference here
records = records['PubmedArticle'] # New line here  

for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()
nbryans
  • 1,507
  • 17
  • 24