XML parser-writer that keeps Attributes order

Question

I need to parse XML document and then write every node to separate files keeping exact order of attributes. So if i have input file like :

<item a="a" b="b" c="c"/>
<item a="a1" b="b2" c="c3"/>

Output should be 2 files with every item. Now if xml.dom.minidom is used - attribute order is changed in output( i can get - <item b="b" c="c" **a="a"**/>)

I found pxdom lib, it keeps order but very-very slow( minidom parsing takes 0.08 sec., pxdom parsing takes 2,5 sec.)

Is there any other python libraries that can keep attributes?

UPD: libarry should also keep upper and lower cases. So "Item" is not equal to "item"

The general consensus is that attribute order doesn't matter. Why do you need to keep them ordered? — , Oct 23 '10 at 11:25
That's not my wish) Unfortunately airfare's GDS ( global distibution system) i'm working with requires exact match of attribute order. — Andrew, Oct 23 '10 at 11:30

score 1 · Accepted Answer · edited May 23 '17 at 12:22

1

You might find this question useful. Bottom line summary-- standard xml tools and libraries most likely won't be able to do this.

edited May 23 '17 at 12:22

Community

1
1

answered Oct 23 '10 at 16:19

snapshoe

13,454
1
24
28

thanks, i saw that Q, pxdom does it but very very slow. in general problem is to find some library that uses list(instead of dict) as a storage for attrs – Andrew Oct 23 '10 at 17:50
A library that does this would have to store both a dict and a list, for both the mapping and the order. Or possibly an OrderedDict. I tried this scenario with `lxml` before posting this answer, and no matter how many attributes I added, the keys *were* always in the order as listed in the xml file. But I have no idea if that is guaranteed. – snapshoe Oct 23 '10 at 19:27

score 0 · Answer 2 · answered Oct 23 '10 at 11:08

0

You can use BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup as soup

>>> html = '''<item a="a" b="b" c="c"/>
<item a="a1" b="b2" c="c3"/>'''
>>> s = soup(html)
>>> s.findAll('item')
[<item a="a" b="b" c="c"></item>, <item a="a1" b="b2" c="c3"></item>]

answered Oct 23 '10 at 11:08

rubik

8,814
9
58
88

1

unfortunately BeautifulSoup changes all nodes to the lower case. and it seems that beatifulsoup cannot be case sensetive – Andrew Oct 23 '10 at 11:37
1

you can keep the letter case by selecting to parse with XML: e.g. s = soup(html,"xml") – RaamEE Mar 07 '17 at 12:59

XML parser-writer that keeps Attributes order

2 Answers2