Parsing XML attribute with namespace python3

Question

I have looked at the other question over Parsing XML with namespace in Python via 'ElementTree' and reviewed the xml.etree.ElementTree documentation. The issue I'm having is admittedly similar so feel free to tag this as duplicate, but I can't figure it out.

The line of code I'm having issues with is

instance_alink = root.find('{http://www.w3.org/2005/Atom}link')

My code is as follows:

import xml.etree.cElementTree as ET

tree = ET.parse('../../external_data/rss.xml')
root = tree.getroot()

instance_title = root.find('channel/title').text
instance_link = root.find('channel/link').text
instance_alink = root.find('{http://www.w3.org/2005/Atom}link')
instance_description = root.find('channel/description').text
instance_language = root.find('channel/language').text
instance_pubDate = root.find('channel/pubDate').text
instance_lastBuildDate = root.find('channel/lastBuildDate').text

The XML file:

<?xml version="1.0" encoding="windows-1252"?>
<rss version="2.0">
  <channel>
    <title>Filings containing financial statements tagged using the US GAAP or IFRS taxonomies.</title>
    <link>http://www.example.com</link>
    <atom:link href="http://www.example.com" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
    <description>This is a list of up to 200 of the latest filings containing financial statements tagged using the US GAAP or IFRS taxonomies, updated every 10 minutes.</description>
    <language>en-us</language>
    <pubDate>Mon, 20 Nov 2017 20:20:45 EST</pubDate>
    <lastBuildDate>Mon, 20 Nov 2017 20:20:45 EST</lastBuildDate>
....

The attributes I'm trying to retrieve are in line 6; so 'href', 'type', etc.

<atom:link href="http://www.example.com" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>

Obviously, I've tried

instance_alink = root.find('{http://www.w3.org/2005/Atom}link').attrib

but that doesn't work cause it's type None. My thought is that it's looking for children but there are none. I can grab the attributes in the other lines in XML but not these for some reason. I've also played with ElementTree and lxml (but lxml won't load properly on Windows for whatever reason)

Any help is greatly appreciated cause the documentation seems sparse.

score 0 · Answer 1 · answered Nov 21 '17 at 05:13

0

I was able to solve with

alink = root.find('channel/{http://www.w3.org/2005/Atom}link').attrib

the issue is that I was looking for the tag {http://www.w3.org/2005/Atom}link at the same level of <channel>, which, of course, didn't exist.

answered Nov 21 '17 at 05:13

mrruz

1

Parsing XML attribute with namespace python3

1 Answers1