0

I wish to extract some ids(doi, pmcid and pmid) from a .xml file from the record tag using python:

xml file:

<pmcids status="ok">
    <request idtype="doi" dois="" versions="yes" showaiid="no">
        <warning>no e-mail provided</warning>
        <warning>no tool provided</warning>
        <echo>ids=10.1371%2Fjournal.pone.0054577</echo>
    </request>
    <record requested-id="10.1371/JOURNAL.PONE.0054577"     pmcid="PMC3557238" pmid="23382917" doi="10.1371/journal.pone.0054577">
        <versions><version pmcid="PMC3557238.1" current="true"/>
        </versions>
    </record>
</pmcids>

I have tried the following code of python :

import xml.etree.cElementTree as etree

xmlDoc = open('garbage_collector/tmp.xml', 'r')
xmlDocData = xmlDoc.read()
xmlDocTree = etree.XML(xmlDocData)

for ingredient in xmlDocTree.iter('record'):
    print ingredient[0].text

I want pmcid, doi and pmid as output in the form of string

sp29
  • 363
  • 4
  • 11

1 Answers1

0

If you can use BeautifulSoup, you could do

from bs4 import BeautifulSoup
soup = BeautifulSoup(input_xml)
t = soup.find('record')

where input_xml is the xml to be examined in string form.

We find the record tag with the find() function and store it in a variable t. The attributes of the <record> tag can now be accessed by indexing t.

print(t['pmcid'])
print(t['doi'])
print(t['pmid'])

would print

PMC3557238
10.1371/journal.pone.0054577
23382917
J...S
  • 5,079
  • 1
  • 20
  • 35