Parser XML in python

Question

I have some database like the next one in XML and im trying to parser it with Python 2.7:

<team>
    <generator>
        <team_name>TeamMaster</team_name>
        <team_year>2000</team_year>
        <team_city>NewYork</team_city>
    </generator>
    <players>
        <definition name="John V." number="4" age="25">
          <criteria position="fow" side="right">
            <criterion website="www.johnV.com" version="1" result="true"/>
          </criteria>
          <object debut="2003" version="3" flag="complete">
            <history item_ref="team34"/>
            <history item_ref="mainteam"/>
        </definition>
        <definition name="Emma" number="2" age="19">
          <criteria position="mid" side="left">
            <criterion website="www.emma.net" version="7" result="true"/>
          </criteria>
          <object debut="2008" version="1" flag="complete">
            <history item_ref="newteam"/>
            <history item_ref="youngteam"/>
            <history item_ref="oldteam"/>
        </definition>

    </players>
</team>

With this small scrip I can parse easily the first part "generator" from my xml, where I know all elements that contains:

from xml.dom.minidom import parseString

mydb = {
"team_name": ,
"team_year": ,
"team_data": 
}

file = open('mydb.xml','r')
data = file.read()
file.close()
dom = parseString(data)
#retrieve the first xml tag (<tag>data</tag>) that the parser finds with name tagName:
xmlTag = dom.getElementsByTagName('team_name')[0].toxml()
#strip off the tag (<tag>data</tag>  --->   data):
xmlData=xmlTag.replace('<team_name>','').replace('</team_name>','')

mydb["team_name"] = xmlData # TeamMaster

But my real problem came when I tried to parse the "players" elements, where attributes appears in "definition" and an unknown numbers of elements in "history". Maybe there is another module that would help me for this better than minidon?

Maybe this can assist you: [XML Parsing with Python and minidom](http://stackoverflow.com/a/1597645/1762224). -- *"getElementsByTagName is recursive, you'll get all descendents with a matching tagName."* — Mr. Polywhirl, Apr 22 '14 at 11:13

score 3 · Accepted Answer · answered Apr 22 '14 at 11:13

3

Better use xml.etree.ElementTree, it has a more pythonic syntax. Get the text of team_name by root.findtext('team_name') or iterate over all definitions with root.finditer('definitions').

answered Apr 22 '14 at 11:13

Daniel

42,087
4
55
81

score 0 · Answer 2 · answered May 26 '22 at 15:45

You can use either Element Tree - XML Parser or use BeautifulSoup XML Parser. I have created repo for usage of XML parser here XML Parsers Collection

Snippet code below:

    #Get the data from XML parser.
    users = xml_parser(users_file,'user') 

    #Iterate through root element.
    for user in users:
        print(user.find('country').text)
        print(user.find('city').text)

Parser XML in python

2 Answers2