Can't parse url xml in python

Question

I've been trying for hours to parse this sample xml from a url using python but I can't extract the definition. Here is what a sample looks like

<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>

I'm trying to access the 'dt' tag because that is where my definition is. This is a short version of the url that contains the xml. Can any of you help me?

Have you tried ElementTree? https://stackoverflow.com/a/1912483/5031672 — Zachary Blackwood, Aug 02 '17 at 18:23
@ZacharyBlackwood Yes I did look at ElementTree and I'm having a hard time trying to extract the definition because there is not value associated with it unlike in the example you gave — danni1234, Aug 02 '17 at 18:30

Gopal Chitalia · Answer 1 · 2017-08-02T19:09:13.117

this will work for you

import xml.etree.ElementTree as ET

data = '''
<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>'''

flag = ET.fromstring(data)
print flag.find('entry/def/sensb/sens/dt').text

score 0 · Accepted Answer · answered Aug 02 '17 at 18:54

If you install BeautifulSoup, something like this should work

from bs4 import BeautifulSoup

xml = '''<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>'''

parsed = BeautifulSoup(xml)

for dt in parsed.findAll("dt"):
    print dt.contents

Can't parse url xml in python

2 Answers2