How do I traverse an XML tree without having to worry about namespace prefixes in Python?

Question

For example, to read an RSS feed, this doesn't work because of the silly {http://purl.org ...} namespaces that get inserted before 'item':

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
import urllib, urllib.request

url = "http://some/rss/feed"
response = urllib.request.urlopen(url)
xml_text = response.read().decode('utf-8')
xml_root = ET.fromstring(xml_text)
for e in xml_root.findall('item'):
  print("I found an item!")

Now that findall() has been rendered useless because of the {} prefixes, here's another solution, but this is ugly:

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
import urllib, urllib.request

url = "http://some/rss/feed"
response = urllib.request.urlopen(url)
xml_text = response.read().decode('utf-8')
xml_root = ET.fromstring(xml_text)
for e in xml_root:
  if e.tag.endswith('}item'):
    print("I found an item!")

Can I get ElementTree to just trash all the prefixes?

score 1 · Answer 1 · edited May 23 '17 at 12:20

You need to handle namespaces as clearly explained at:

Parsing XML with namespace in Python via 'ElementTree'

But, what if instead, you'll use a specialized library for reading RSS feeds, like feedparser:

>>> import feedparser
>>> url = "http://some/rss/feed"
>>> feed = feedparser.parse(url)

Though I would personally use an XMLFeedSpider Scrapy spider. As a bonus, you'll get all other Scrapy web-scraping framework features.

How do I traverse an XML tree without having to worry about namespace prefixes in Python?

1 Answers1