3

I need to parse XML document and then write every node to separate files keeping exact order of attributes. So if i have input file like :

<item a="a" b="b" c="c"/>
<item a="a1" b="b2" c="c3"/>

Output should be 2 files with every item. Now if xml.dom.minidom is used - attribute order is changed in output( i can get - <item b="b" c="c" **a="a"**/>)

I found pxdom lib, it keeps order but very-very slow( minidom parsing takes 0.08 sec., pxdom parsing takes 2,5 sec.)

Is there any other python libraries that can keep attributes?

UPD: libarry should also keep upper and lower cases. So "Item" is not equal to "item"

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Andrew
  • 3,165
  • 4
  • 24
  • 29
  • The general consensus is that attribute order doesn't matter. Why do you need to keep them ordered? –  Oct 23 '10 at 11:25
  • That's not my wish) Unfortunately airfare's GDS ( global distibution system) i'm working with requires exact match of attribute order. – Andrew Oct 23 '10 at 11:30

2 Answers2

1

You might find this question useful. Bottom line summary-- standard xml tools and libraries most likely won't be able to do this.

Community
  • 1
  • 1
snapshoe
  • 13,454
  • 1
  • 24
  • 28
  • thanks, i saw that Q, pxdom does it but very very slow. in general problem is to find some library that uses list(instead of dict) as a storage for attrs – Andrew Oct 23 '10 at 17:50
  • A library that does this would have to store both a dict and a list, for both the mapping and the order. Or possibly an OrderedDict. I tried this scenario with `lxml` before posting this answer, and no matter how many attributes I added, the keys *were* always in the order as listed in the xml file. But I have no idea if that is guaranteed. – snapshoe Oct 23 '10 at 19:27
0

You can use BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup as soup

>>> html = '''<item a="a" b="b" c="c"/>
<item a="a1" b="b2" c="c3"/>'''
>>> s = soup(html)
>>> s.findAll('item')
[<item a="a" b="b" c="c"></item>, <item a="a1" b="b2" c="c3"></item>]
rubik
  • 8,814
  • 9
  • 58
  • 88
  • 1
    unfortunately BeautifulSoup changes all nodes to the lower case. and it seems that beatifulsoup cannot be case sensetive – Andrew Oct 23 '10 at 11:37
  • 1
    you can keep the letter case by selecting to parse with XML: e.g. s = soup(html,"xml") – RaamEE Mar 07 '17 at 12:59