0

I am trying to create a xml file using lxml, I am well aware that the order of attributes in xml doesn't matter but still I am searching for a method to prevent the attributes in a specific order.

I also tried minidom and that didn't workout too.

In lxml I have the following code:

from lxml import etree as ET
from collections import OrderedDict
root = ET.Element("Root", OrderedDict([("id","0"),("start","0"),("end","200")]))
ET.tostring(root)

This part gives the following ouput with the attributes in the order I wanted since I used OrderedDict here:

<Root id="0" start="0" end="200"/>

Then I created a Child using the same method:

child1 = ET.Element("sentence", OrderedDict([("id","0"),("start","0"),("end","255")]))
root.append(child1)
xml_str = ET.tostring(root, pretty_print=True)
print(xml_str)

Printing the xml_str gives output as I expect:

<Root id="0" start="0" end="200">\n  <sentence id="0" start="0" end="255"/>\n</Root>

But when it comes to writing it down to a xml file:

with open('op.xml', 'wb') as f:
  f.write(xml_str)

The ouput isn't same when written to op.xml file:

<?xml version="1.0"?>

<Root end="200" start="0" id="0">
       <sentence end="255" start="0" id="0"/>
</Root>

Clearly seen that the attributes order have changed, is there any way I can get the ouput as I expect i.e attribute orders being maintained.

I have tried using minidom too, but there also it didn't work even after referring to: Preserve order of attributes when modifying with minidom

Nicolò Gasparini
  • 2,228
  • 2
  • 24
  • 53

2 Answers2

0

Here is a library, which is good at extracting data, not very good at modifying XML files, but also can basically meet your needs.

from simplified_scrapy import SimplifiedDoc, utils, req
doc = SimplifiedDoc("<Root></Root>")
doc.Root.setAttrs({"id":"0","start":"0","end":"200"})
doc.Root.setContent("<sentence />")
doc.sentence.setAttrs({"id":"0","start":"0","end":"200"})
utils.saveFile("op.xml",doc.html)

Result:

<Root id="0" start="0" end="200"><sentence id="0" start="0" end="200" /></Root>
dabingsou
  • 2,469
  • 1
  • 5
  • 8
  • Tried it, this also gives the xml with order of attributes changed, basically not providing the expected result....... ` ` – Vipulkumar Yadav Jun 03 '20 at 02:14
  • @VipulkumarYadav Please confirm. It spelled the attributes in the order in which they were entered into the dictionary. – dabingsou Jun 03 '20 at 02:44
  • What worked for my previous code in the question is this: I wrote the output to a txt file, and in the txt file the order remained as I wanted, then I changed the extension of the txt file to xml, however this solution only worked for Ubuntu, not working in windows – Vipulkumar Yadav Jun 03 '20 at 03:13
  • It's so strange. I have no problem testing on mac or windows. – dabingsou Jun 03 '20 at 05:34
0

Using lxml.etree makes it work:

import lxml.etree
from collections import OrderedDict

root = lxml.etree.Element("Root", OrderedDict([("id","0"),("start","0"),("end","200")]))
isVal = lxml.etree.SubElement(root, 'sentence', OrderedDict([("id","0"),("start","0"),("end","255")]))

with open("xyz2.xml", 'wb') as f:
    f.write(lxml.etree.tostring(root, xml_declaration=True, encoding="utf-8"))

print(open("xyz2.xml", 'r').read())

Output:

<?xml version='1.0' encoding='utf-8'?>
<Root id="0" start="0" end="200"><sentence id="0" start="0" end="255"/></Root>
Maurice Meyer
  • 17,279
  • 4
  • 30
  • 47