0

I've already seen this question, but it's from the 2009.
What's a simple modern way to handle XML files in Python 3?

I.e., from this TLD (adapted from here):

<?xml version="1.0" encoding="UTF-8" ?>
<taglib>
  <tlib-version>1.0</tlib-version>
  <short-name>bar-baz</short-name>

  <tag>
  <name>present</name>
     <tag-class>condpkg.IfSimpleTag</tag-class>
  <body-content>scriptless</body-content>

  <attribute>
    <name>test</name>
    <required>true</required>
    <rtexprvalue>true</rtexprvalue>
  </attribute>

  </tag> 

</taglib>

I want to parse TLD files (Java Server Pages Tag Library Descriptors), to obtain some sort of structure in Python (I have still to decide about that part).

Hence, I need a push parser. But I won't do much more with it, so I'd rather prefer a simple API (I'm new to Python).

Community
  • 1
  • 1
watery
  • 5,026
  • 9
  • 52
  • 92
  • What is your desired output? What data do you need to get from the XML? – alecxe Aug 30 '14 at 16:21
  • @alecxe I'm thinking about providing informations for an autocomplete feature of an editor, but I have yet to analyze the problem and understand that editor features; anyway I think that my effort should be data structure independent, so that, if one day I need to provide the parsed info to another piece of software, I'll just change the holding data structure, but not the parsing logic. But, as I said, everything is still at the beginning, so any suggestion is welcome. – watery Aug 30 '14 at 16:25

1 Answers1

0

xml.etree.ElementTree is still there, in the standard library:

import xml.etree.ElementTree as ET

data = """your xml here"""

tree = ET.fromstring(data)
print(tree.find('tag/name').text)  # prints "present"

If you look outside of the standard library, there is a very popular and fast lxml module that follows the ElementTree interface and supports Python3:

from lxml import etree as ET

data = """your xml here"""

tree = ET.fromstring(data)
print(tree.find('tag/name').text)  # prints "present"

Besides, there is lxml.objectify that allows you to deal with XML structure like with a Python object.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I was thinking more to something like providing the XML file, a callback and the parser would invoke the callback providing each node context. – watery Aug 30 '14 at 16:27
  • @watery hm, then I think you would find [`iterparse()`](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse) helpful. Take a look at the relevant threads: http://stackoverflow.com/questions/9856163/using-lxml-and-iterparse-to-parse-a-big-1gb-xml-file, http://stackoverflow.com/questions/12792998/elementtree-iterparse-strategy. Hope that helps. – alecxe Aug 30 '14 at 16:32
  • I moved away from that project pretty soon, I hadn't had a chance to give this a try. – watery Mar 02 '16 at 13:26