4

I'm trying to parse XML document in Python, so that I can do manipulations on the data and write out a new file. The full file that I'm working with is here, but here is an excerpt:

<?xml version="1.0" encoding="UTF-8"?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
    <ERRORCODE>0</ERRORCODE>
    <PRODUCT BUILD="09-11-2013" NAME="FileMaker" VERSION="ProAdvanced 12.0v5"/>
    <DATABASE DATEFORMAT="M/d/yyyy" LAYOUT="" NAME="All gigs 88-07.fmp12" RECORDS="746" TIMEFORMAT="h:mm:ss a"/>
    <METADATA>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Country" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Year" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="City" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="State" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Theater" TYPE="TEXT"/>
    </METADATA>
    <RESULTSET FOUND="746">
        <ROW MODID="3" RECORDID="32">
            <COL>
                <DATA/>
            </COL>
            <COL>
                <DATA>1996</DATA>
            </COL>
            <COL>
                <DATA>Pompano Beach</DATA>
            </COL>
            <COL>
                <DATA>FL</DATA>
            </COL>
            <COL>
                <DATA>First Presbyterian Church</DATA>
            </COL>
        </ROW>
        <ROW MODID="3" RECORDID="33">
            <COL>
                <DATA/>
            </COL>
            <COL>
                <DATA>1996</DATA>
            </COL>
            <COL>
                <DATA>Hilton Head</DATA>
            </COL>
            <COL>
                <DATA>SC</DATA>
            </COL>
            <COL>
                <DATA>Self Family Arts Center</DATA>
            </COL>
        </ROW>
        <!-- snip many more ROW elements -->
    </RESULTSET>
</FMPXMLRESULT>

Eventually, I want to use the information from the METADATA field to parse the columns in the RESULTSET, but for now I’m having trouble just getting a handle on the data. Here is what I’ve tried to get the contents of the METADATA element:

import xml.etree.ElementTree as ET

tree = ET.parse('giglist.xml')
root = tree.getroot()
print root
metadata = tree.find("METADATA")
print metadata

This prints out:

<Element '{http://www.filemaker.com/fmpxmlresult}FMPXMLRESULT' at 0x10f982cd0>
None

Why is metadata empty? Am I misusing the find() method?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Zev Eisenberg
  • 8,080
  • 5
  • 38
  • 82

1 Answers1

6

You need to handle namespaces.

But, since there is only a default namespace given, you can find the element by using the following syntax:

import xml.etree.ElementTree as ET

ns = 'http://www.filemaker.com/fmpxmlresult'

tree = ET.parse('giglist.xml')
root = tree.getroot()

metadata = root.find("{%s}METADATA" % ns)
print metadata  # prints <Element '{http://www.filemaker.com/fmpxmlresult}METADATA' at 0x103ccbe90>

Here are the relevant threads you may want to see:


UPD (getting the list of results):

import xml.etree.ElementTree as ET

ns = 'http://www.filemaker.com/fmpxmlresult'

tree = ET.parse('giglist.xml')
root = tree.getroot()

keys = [field.attrib['NAME'] for field in root.findall(".//{%(ns)s}METADATA/{%(ns)s}FIELD" % {'ns': ns})]
results = [dict(zip(keys, [col.text for col in row.findall(".//{%(ns)s}COL/{%(ns)s}DATA" % {'ns': ns})]))
           for row in root.findall(".//{%(ns)s}RESULTSET/{%(ns)s}ROW" % {'ns': ns})]

print results

Prints:

[
    {'City': 'Pompano Beach', 'Country': None, 'State': 'FL', 'Theater': 'First Presbyterian Church', 'Year': '1996'}, 
    {'City': 'Hilton Head', 'Country': None, 'State': 'SC', 'Theater': 'Self Family Arts Center', 'Year': '1996'}
]
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 1
    Ooh, I had always assumed that namespace was just XML noise and clutter. I’ll give it a try; thanks. – Zev Eisenberg Oct 04 '14 at 20:01
  • this seemingly also returns the `` opening tag. Is there any way to avoid this, besides just filtering it out when I use the results? – Zev Eisenberg Oct 04 '14 at 20:22
  • @ZevEisenberg yup, the code I've posted returns the metadata tag. What data do you want to get from the xml file? What is your desired output? Thanks. – alecxe Oct 04 '14 at 20:35
  • I'm trying to get the `ROW` elements into an array of dictionaries/tuples/objects that I can work with in Python. Was assuming I would use `METADATA` to figure out what each element inside a `ROW` is. – Zev Eisenberg Oct 04 '14 at 20:45
  • big picture: trying to make a script that can generate this page in one click from a FileMaker database, instead of having to update it by hand: http://www.avnertheeccentric.com/giglist.php – Zev Eisenberg Oct 04 '14 at 20:46
  • to clarify: your code gives me the `METADATA` tag and all its children as well. Trying to get just the children, without the parent. – Zev Eisenberg Oct 04 '14 at 20:49
  • 1
    @ZevEisenberg thank you for the information - I would take a look later today. – alecxe Oct 04 '14 at 20:50
  • @ZevEisenberg please see the updated answer (sorry for the delay - had too many cocktails yesterday :)) – alecxe Oct 05 '14 at 15:56
  • amazing. Looks like I bit off more than I could chew, but your code works perfectly. – Zev Eisenberg Oct 05 '14 at 16:03