0

I'm trying to use xml.etree.ElementTree in python but it works for sample code, but doesn't work for my other code.

Ex: XML FILE: PROGRAM IS WORKING FINE WITH THIS XML FILE:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

Program:

import xml.etree.ElementTree as ET
tree = ET.parse('country.xml')
root = tree.getroot()
for page in root.findall('country'):
    print("inside")

OutPut:

inside
inside
inside

This doesn't work for below program:

XML File: **PROGRAM IS NOT WORKING WITH THE BELOW XML FILE**

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/ http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en">
  <siteinfo>
    <sitename>Wikipedia</sitename>
    <base>http://en.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.23wmf11</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
    </namespaces>
  </siteinfo>
  <page>
    <title>Affirming the consequent</title>
    <ns>0</ns>
    <id>675</id>
  </page>
</mediawiki>

Code:

import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for page in root.findall('page'):
    print("inside")

Output: No Output.

I figured out the reason is due to attributes in mediawiki tag. But I can't avoid that tag in my sample data. Is there any possible way to make this work.

user1919035
  • 227
  • 1
  • 4
  • 12

1 Answers1

0

Try this...

for page in root.findall('{mediawiki.org/xml/export-0.8}page'):
    print(page)
jeremyjjbrown
  • 7,772
  • 5
  • 43
  • 55
  • 1st code is correct. I don't want to display content. In the 2nd XML file, I don't see any output. – user1919035 Feb 22 '14 at 03:28
  • try `print tree.getchildren()` to see what the tag looks like with the xmlns included. It will issue a deprecated warning but should yield the tag names. – jeremyjjbrown Feb 22 '14 at 03:35
  • It displays child elements. Try to see 2nd xml file and program. 2nd program is working only if I remove all the attributes from MEDIAWIKI tag. I'm I wrong somewhere? – user1919035 Feb 22 '14 at 03:47
  • [, ] This is the display of getchildren() – user1919035 Feb 22 '14 at 04:17
  • Adding the full namespace to findall is working for me: "root.findall('{http://www.mediawiki.org/xml/export-0.8/}page')", however I can't figure out how to tell ElementTree to use this as the default to allow "root.findall('page')". – Joseph Sheedy Apr 22 '14 at 02:46
  • see this http://stackoverflow.com/questions/14853243/parsing-xml-with-namespace-in-python-elementtree an accepted answer and/or upvote would be appreciated since I took the time to answer your question. – jeremyjjbrown Apr 23 '14 at 00:49