5

For this following xml, how do I fetch the xml and then parse it to get out the value for <age>?

<boardgames>
  <boardgame objectid="13">
  <yearpublished>1995</yearpublished>
  <minplayers>3</minplayers>
  <maxplayers>4</maxplayers>
  <playingtime>90</playingtime>
  <age>10</age>
  <name sortindex="1">Catan</name>
  ...

I'm currently trying:

result = urlfetch.fetch(url=game_url)
xml = ElementTree.fromstring(result.content)

But I'm not sure I'm on the right path. When I try to parse I get errors (I think because the xml is not valid xml).

jfs
  • 399,953
  • 195
  • 994
  • 1,670
Will Curran
  • 6,959
  • 15
  • 59
  • 92
  • Works fine when I grab the page with `urllib2`: `xml = ElementTree.fromstring(urllib2.urlopen('http://www.boardgamegeek.com/xmlapi/boardgam e/13').read())` – moinudin Dec 29 '10 at 18:59
  • I'm getting the xml, but I don't know how to use ElementTree to grab the values of individual elements. So how do I grab the value for ? – Will Curran Dec 29 '10 at 19:02

2 Answers2

7

xml.findtext('age') or xml.findtext('boardgames/age') would normally get you the 10 inside <age>10</age>, but the parsing appears to fail due to invalid xml. ElementTree does a rather poor job of parsing invalid xml in my experience.

Instead use BeautifulSoup, which handles invalid xml well.

content = urllib2.urlopen('http://boardgamegeek.com/xmlapi/boardgame/13').read()
soup = BeautifulSoup(content)
print soup.find('age').string
moinudin
  • 134,091
  • 45
  • 190
  • 216
2

The following works for me:

import urllib2
from xml.etree import ElementTree

result = urllib2.urlopen('http://boardgamegeek.com/xmlapi/boardgame/13').read()
xml = ElementTree.fromstring(result)
print xml.findtext(".//age")
mzjn
  • 48,958
  • 13
  • 128
  • 248