5

I'm using xmltodict for XML parsing/unparsing, and I need to preserve the XML element ordering while processing one document. Toy REPL example:

>>> import xmltodict
>>> xml = """
... <root>
...   <a />
...   <b />
...   <a />
... </root>
... """
>>> xmltodict.parse(xml)
OrderedDict([('root', OrderedDict([('a', [None, None]), ('b', None)]))])
>>> xmltodict.unparse(_)
'<?xml version="1.0" encoding="utf-8"?>\n<root><a></a><a></a><b></b></root>'

Note that the original sequence [a, b, a] is replaced by [a, a, b]. Is there any way to preserve the original order with xmltodict?

el.atomo
  • 5,200
  • 3
  • 30
  • 28
  • 6
    From [the maintainer of the project](https://github.com/martinblech/xmltodict/issues/111): "The guiding principle behind the design of xmltodict is to implement one simple mapping between XML and nested dict+list+string structures resembling what you would get from a JSON document, no more, no less. I have no intention of making xmltodict a full fledged XML processing framework and I think that features like the one proposed in this ticket come at the expense of the library's ease of use and code simplicity/maintainability... if you're hitting xmltodict's limitations, you should use XPath...." – Bob Dylan Jan 06 '16 at 15:15
  • So there you have it, use XPath (or lxml). – Bob Dylan Jan 06 '16 at 15:16
  • Thanks @BobDylan. I guess I'll have to throw way `xmltodict` and go straight with `lxml` :( – el.atomo Jan 06 '16 at 15:27
  • 1
    Maybe if you explain what you want to *do* actually, that would give people a chance to propose a solution. – Tomalak Jan 06 '16 at 15:40
  • @Tomalak, I'm interested on preserving the original element order. For my needs, `` (the original XML in my example) is not the same as `` (the unparsed output of xmltodict). – el.atomo Jan 06 '16 at 15:56
  • Yes, but what *for*? (Also, any reason why using a DOM API has not been your first choice?) – Tomalak Jan 06 '16 at 17:52
  • 1
    @Tomalak, the XML I provided is a simplification of one application configuration file. The elements `` and `` represent operations for that system, to be applied _sequentially_. The application I'm working on interacts with that system, but doesn't need to know every schema detail. I chose _xmltodict_ for convenience; for my purposes is beautifully simple, it parses directly to python dicts, which enables direct JSON serialization for other purposes. – el.atomo Jan 06 '16 at 20:29
  • Thank you. It's a good thing to add such a paragraph to the question to make it easier for future visitors to decide whether this is something for them or not (also to provide a little more "beef" for search engines). – Tomalak Jan 06 '16 at 21:12

1 Answers1

2

It's not super elegant, but minidom can do the job just fine:

import xml.dom.minidom as minidom

xml = """
<root>
<a />
<b />
<a />
</root>
"""
doc = minidom.parseString(xml)                  # or minidom.parse(filename)
root = doc.getElementsByTagName('root')[0]      # or doc.documentElement
items = [n for n in root.childNodes if n.nodeType == doc.ELEMENT_NODE]

for item in items:
    print item.nodeName

You can of course use a full-blown DOM API like lxml but for the modest task of iterating some nodes in document order it's maybe not necessary.

Tomalak
  • 332,285
  • 67
  • 532
  • 628