1

In my xml file I have <?xml version="1.0" encoding="utf-8"?> at the beginning. But it disappears if I encode it to a string. By that I mean my string does not have it anymore at the beginning. I thought I can simply insert it in my string like in the code below (which worked when printing it), but when I wanted to save the string as a xml again on my laptop and open it, <?xml version="1.0" encoding="utf-8"?> wasn't there anymore.

import xml.etree.ElementTree as ET

tree = ET.parse(r'someData.xml')
root = tree.getroot()

xml_str = ET.tostring(root, encoding='unicode')
xml_str = '<?xml version="1.0" encoding="utf-8"?>' + xml_str

Does anybody know how to encode the xml to a string without loosing <?xml version="1.0" encoding="utf-8"?> OR how to save the string as xml without loosing it? My aim is to have it in my exported xml. Thank you in advance!!

nwellnhof
  • 32,319
  • 7
  • 89
  • 113
John
  • 91
  • 7
  • `xml_declaration=True` should work. – Joachim Sauer Nov 30 '22 at 09:33
  • @JoachimSauer No unfortunately this gives me ```\n``` at the beginning istead of `````` – John Nov 30 '22 at 09:36
  • Then try specifying actual `UTF-8` as the encoding. Note however that in that case `ET.toString()` will return a byte string (since a concrete encoding doesn't make sense when returning a unicode string). – Joachim Sauer Nov 30 '22 at 09:53
  • @JoachimSauer Yes that did work!! I had to convert it to string using .decode(). However, when I want to save it ```tree = ET.ElementTree(ET.fromstring(`xml_str)) tree.write(open('test2.xml', 'a'), encoding='unicode')``` it is not in the exported xml file. Any idea on how to solve that? – John Nov 30 '22 at 10:03
  • The element tree itself doesn't contain that information, so going back and forth between string and ElementTree is pointless. Simply use your (original, unmodified) ElementTree and call `ET.write(..., encoding='UTF-8')` on it! The encoding is *purely* a property of the **representation** (i.e. the actual bytes) and not part of the xml data stored in the file. That's why it's "lost" when you go through an ElementTree object. – Joachim Sauer Nov 30 '22 at 10:43
  • @JoachimSauer Thanks! It's solved now with ```tree.write(open('test.xml', 'wb'), encoding='UTF-8', xml_declaration=True)``` – John Nov 30 '22 at 10:54
  • See also [this answer](https://stackoverflow.com/a/4999510/235698). – Mark Tolonen Nov 30 '22 at 17:56

1 Answers1

1

To make it more visible:

Before I saved the file so:

tree = ET.ElementTree(ET.fromstring(xml_str)) 
tree.write(open('test2.xml', 'a'), encoding='unicode')

But now, I save it like this so I don't miss the declaration at the beginning of the xml file:

tree = ET.ElementTree(ET.fromstring(xml_str)) 
tree.write(open('test.xml', 'wb'), encoding="utf-8", xml_declaration=True)
John
  • 91
  • 7