4

I am creating an xml file, I just want that it should say this on the top

<?xml version="1.0" encoding="utf-8"?>

For now it says only

<?xml version="1.0" ?>

This is how I am creating it.

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ")
with open("Test.xml", "w", encoding='utf-8') as f:
    f.write(xmlstr)
Abhishek Rai
  • 2,159
  • 3
  • 18
  • 38
  • Did you copy that code from another SO question? The accepted answer there is wrong. There's no reason to convert the parsed XML data in root into a string only to parse it again with minidom to generate *another* string. The [*other* answer](https://stackoverflow.com/a/68618047/134204) is better. `ET.indent(tree, space="\t", level=0)` to indent, `tree.write(file_name, encoding="utf-8")` to write – Panagiotis Kanavos Aug 25 '21 at 11:28
  • @PanagiotisKanavos: It works if you have Python 3.9. The OP uses Python 3.7. This answer is even better: https://stackoverflow.com/a/63373633/407651 – mzjn Aug 25 '21 at 11:36

2 Answers2

2

It is enough to add encoding parameter:

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ", encoding="utf-8")

change "w" in the file save to "wb".

And, as suggested in the comments, to drop spurious parsing back to XML:

from lxml import etree

xml_object = etree.tostring(root,
                            pretty_print=True,
                            xml_declaration=True,
                            encoding='UTF-8')

Then it is enough to write xml_object to the file.

sophros
  • 14,672
  • 11
  • 46
  • 75
  • 1
    And probably remove `minidom` as well. The question's code writes the DOM into a string only to parse it back into a DOM to write it back to a string. Looks like that code was copy-pastaed from another SO question – Panagiotis Kanavos Aug 25 '21 at 11:25
  • @AbhishekRai don't use *two* parsers. Use either `minidom` or `ElementTree`. If you already use `ElementTree` you can indent and output the XML string using only ET methods – Panagiotis Kanavos Aug 25 '21 at 11:29
  • @AbhishekRai your code is identical to other SO questions. – Panagiotis Kanavos Aug 25 '21 at 11:31
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/236395/discussion-between-abhishek-rai-and-panagiotis-kanavos). – Abhishek Rai Aug 25 '21 at 11:32
2

By adding the encoding argument you will get a byte string (that is why you have to change a file writing argument to binary mode). To return a string as you originally wanted and in the same time getting <?xml version="1.0" encoding="utf-8"?>, you may use the following code:

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ", encoding="utf-8").decode("utf-8")
Vladimir S.
  • 511
  • 4
  • 13