0

I've been trying for hours to parse this sample xml from a url using python but I can't extract the definition. Here is what a sample looks like

<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>

I'm trying to access the 'dt' tag because that is where my definition is. This is a short version of the url that contains the xml. Can any of you help me?

danni1234
  • 5
  • 3

2 Answers2

0

this will work for you

import xml.etree.ElementTree as ET

data = '''
<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>'''

flag = ET.fromstring(data)
print flag.find('entry/def/sensb/sens/dt').text
Gopal Chitalia
  • 430
  • 4
  • 18
0

If you install BeautifulSoup, something like this should work

from bs4 import BeautifulSoup

xml = '''<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>'''

parsed = BeautifulSoup(xml)

for dt in parsed.findAll("dt"):
    print dt.contents