3

I'd like to parse an xml document using Python's xml.etree.ElementTree module. However, I want all the elements in the resulting tree object to have some class methods that I define. This suggests creating my own subclass of Python's element class, but I'm having trouble telling the parser to use my own element subclass when parsing, instead of the built in class.

For example, let's say I want the nodes in the tree to have a new method called custommethod(). To do this, I create an element subclass:

class MyElement(xml.etree.ElementTree._Element):

    def custommethod():
        . . . 

Now, when I parse a tree using

tree = xml.etree.ElementTree.parse(source)

I want all the elements in tree to have the custommethod() method. So,

tree.getroot.custommethod() 

should not fail.

But I don't know how to tell the parser to use my element class - is this even possible? There are some hints in the Python documentation about passing a custom parser to .parse(), but not a lot of details.

user592838
  • 31
  • 1
  • 4

6 Answers6

3

A custom XML parser is a subclass of xml.etree.ElementTree.XMLParser with four functions defined:

  1. start(self, tag, attrs) Called when an opening tag is found.
  2. end(self, tag) Called when an closing tag is found.
  3. data(self, data) Called when data is found.
  4. close(self) Called at the end of the parse.

You must manage everything else, how the node's instances are created, the depth of each tag, etc. Notice that the data(self, data) method doesn't have a tag argument.

Terseus
  • 2,082
  • 15
  • 22
  • How does one go about doing the "manage everything else" part? I haven't been able to find any documentation on that, either here on SO or in the Python documentation, nor with pretty extensive googling. I don't expect a whole tutorial or anything, but a few hints, or a link to a document or something...? – Wilson F Jun 26 '16 at 22:27
1

In Python3, monkey patching of the Element class did not work properly. I suspect it is due to three reasons:

  1. Overriding __class__ / __dict etc of the built-in class doesn't work.
  2. ElementTree uses C to do heavy lifting.
  3. The classes used to parse is a web of class instantiations.

Solution: Create your own parse function!

import xml.etree.ElementTree
def parse(source):
    """Parse wrapper to build a tree with the extended Node class"""
    treebuilder = xml.etree.ElementTree.TreeBuilder(element_factory=MyElement)
    parser = xml.etree.ElementTree.XMLParser(target=treebuilder)
    tree = xml.etree.ElementTree.parse(source, parser)
    return tree

The example above uses your MyElement class instead of ElementTree.Element for every object of the tree.

Ferschae Naej
  • 1,826
  • 1
  • 15
  • 16
  • Awesome! To parse element from string, in addition check @dgassaway [answer](https://stackoverflow.com/a/18281386/10815899) – Viljami Nov 10 '22 at 13:44
1

The book Dive Into Python has quite an extensive coverage of this subject. This chapter is online. Near the bottom are the steps explained to create a custom XML parser. Not sure if it gets you the information you need, but perhaps it's a good starting point.

Abel
  • 56,041
  • 24
  • 146
  • 247
0

You can specify the element class to use while creating the TreeBuilder instance, like this:

from xml.etree import cElementTree as etree

class MyCustomElement(etree.Element):
    pass

tb = etree.TreeBuilder(element_factory=MyCustomElement)

Ref: https://github.com/python/cpython/blob/master/Lib/xml/etree/ElementTree.py#L1370

onyb
  • 68
  • 2
  • 4
0

Simply add a custom method to _Element

>>> def customMethod(self):
    print("Done!")

>>> e._Element.customMethod=customMethod
synthesizerpatel
  • 27,321
  • 5
  • 74
  • 91
Kabie
  • 10,489
  • 1
  • 38
  • 45
  • This almost worked. One of the things I want to do is change the tag of the element, which means doing self.tag = "something else" in the function. If I do this, I can call the function as a method of _Element, but it only prints("Done!") without changing the tag. – user592838 Mar 08 '11 at 19:00
  • Assigning a new custom method after parsing the tree got me looking, and I found this, which is almost the right thing: http://stackoverflow.com/questions/962962/python-changing-methods-and-attributes-at-runtime – user592838 Mar 08 '11 at 19:02
0

I was looking for a solution for this problem and found lxml supports it directly.

>>> from lxml import etree

>>> class honk(etree.ElementBase):
...    @property
...    def honking(self):
...       return self.get('honking') == 'true'

look at lxml element classes

janh
  • 1
  • 1