94

I am generating an XML document in Python using an ElementTree, but the tostring function doesn't include an XML declaration when converting to plaintext.

from xml.etree.ElementTree import Element, tostring

document = Element('outer')
node = SubElement(document, 'inner')
node.NewValue = 1
print tostring(document)  # Outputs "<outer><inner /></outer>"

I need my string to include the following XML declaration:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

However, there does not seem to be any documented way of doing this.

Is there a proper method for rendering the XML declaration in an ElementTree?

Stevoisiak
  • 23,794
  • 27
  • 122
  • 225
Roman Alexander
  • 1,025
  • 1
  • 8
  • 12

11 Answers11

140

I am surprised to find that there doesn't seem to be a way with ElementTree.tostring(). You can however use ElementTree.ElementTree.write() to write your XML document to a fake file:

from io import BytesIO
from xml.etree import ElementTree as ET

document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
et = ET.ElementTree(document)

f = BytesIO()
et.write(f, encoding='utf-8', xml_declaration=True) 
print(f.getvalue())  # your XML file, encoded as UTF-8

See this question. Even then, I don't think you can get your 'standalone' attribute without writing prepending it yourself.

Community
  • 1
  • 1
wrgrs
  • 2,467
  • 1
  • 19
  • 24
32

I would use lxml (see http://lxml.de/api.html).

Then you can:

from lxml import etree
document = etree.Element('outer')
node = etree.SubElement(document, 'inner')
print(etree.tostring(document, xml_declaration=True))
glormph
  • 994
  • 6
  • 13
24

If you include the encoding='utf8', you will get an XML header:

xml.etree.ElementTree.tostring writes a XML encoding declaration with encoding='utf8'

Sample Python code (works with Python 2 and 3):

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(
    ElementTree.fromstring('<xml><test>123</test></xml>')
)
root = tree.getroot()

print('without:')
print(ElementTree.tostring(root, method='xml'))
print('')
print('with:')
print(ElementTree.tostring(root, encoding='utf8', method='xml'))

Python 2 output:

$ python2 example.py
without:
<xml><test>123</test></xml>

with:
<?xml version='1.0' encoding='utf8'?>
<xml><test>123</test></xml>

With Python 3 you will note the b prefix indicating byte literals are returned (just like with Python 2):

$ python3 example.py
without:
b'<xml><test>123</test></xml>'

with:
b"<?xml version='1.0' encoding='utf8'?>\n<xml><test>123</test></xml>"
Alexander O'Mara
  • 58,688
  • 18
  • 163
  • 171
  • In Python 3, escape characters will be shown in the declaration when printing. `` – Stevoisiak Oct 30 '17 at 16:05
  • What helped in this answer is wondering why you were doing so much of this `Elementree.Elementree(Elementree.fromstring(...` and I now realize `fromstring` returns an `element` not an `ElementTree`, whereas the `parse` method does return an `ElementTree`. This make trying to mock an xml file in a test suite by using a string very confusing! If you take that element and run `tostring`, it allows those encoding & method parameters, but the output is missing the ` – Davos Apr 20 '18 at 07:24
  • 1
    Note that `utf8` is NOT a valid character encoding string. That's also why Python3 adds the declaration and returns the whole thing as Bytes instead of string. – mbirth Sep 15 '19 at 19:44
  • @mbirth so the method should be stated as "tobytes" not 'tostring'. – Marek Marczak Dec 05 '20 at 17:42
  • @MarekMarczak No, the XML should read `encoding='utf-8'` to be valid. – mbirth Dec 09 '20 at 18:56
  • You mean the declaration or actual text encoding? The result is 'bytes', not 'string'. It's just confusing. – Marek Marczak Dec 09 '20 at 21:55
  • This should be the accepted answer. It works exactly as needed. – Alexis Evelyn Jan 24 '21 at 10:09
18

xml_declaration Argument

Is there a proper method for rendering the XML declaration in an ElementTree?

YES, and there is no need of using .tostring function. According to ElementTree Documentation, you should create an ElementTree object, create Element and SubElements, set the tree's root, and finally use xml_declaration argument in .write function, so the declaration line is included in output file.

You can do it this way:

import xml.etree.ElementTree as ET

tree = ET.ElementTree("tree")

document = ET.Element("outer")
node1 = ET.SubElement(document, "inner")
node1.text = "text"

tree._setroot(document)
tree.write("./output.xml", encoding = "UTF-8", xml_declaration = True)  

And the output file is:

<?xml version='1.0' encoding='UTF-8'?>
<outer><inner>text</inner></outer>
smrachi
  • 368
  • 3
  • 8
4

I encounter this issue recently, after some digging of the code, I found the following code snippet is definition of function ElementTree.write

def write(self, file, encoding="us-ascii"):
    assert self._root is not None
    if not hasattr(file, "write"):
        file = open(file, "wb")
    if not encoding:
        encoding = "us-ascii"
    elif encoding != "utf-8" and encoding != "us-ascii":
        file.write("<?xml version='1.0' encoding='%s'?>\n" % 
     encoding)
    self._write(file, self._root, encoding, {})

So the answer is, if you need write the XML header to your file, set the encoding argument other than utf-8 or us-ascii, e.g. UTF-8

alijandro
  • 11,627
  • 2
  • 58
  • 74
  • 1
    It would be a nice albeit brittle hack, but it doesn't seem to work (the encoding is probably lower-cased before that). Also, `ElementTree.ElementTree.write()` is documented to have a `xml_declaration` paramater (see the accepted answer). But `ElementTree.tostring()` doesn't have that parameter, which was the method asked in the original question. – Quentin Pradet Apr 14 '15 at 07:31
3

Easy

Sample for both Python 2 and 3 (encoding parameter must be utf8):

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(ElementTree.fromstring('<xml><test>123</test></xml>'))
root = tree.getroot()
print(ElementTree.tostring(root, encoding='utf8', method='xml'))

From Python 3.8 there is xml_declaration parameter for that stuff:

New in version 3.8: The xml_declaration and default_namespace parameters.

xml.etree.ElementTree.tostring(element, encoding="us-ascii", method="xml", *, xml_declaration=None, default_namespace=None, short_empty_elements=True) Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding 1 is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). method is either "xml", "html" or "text" (default is "xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). Returns an (optionally) encoded string containing the XML data.

Sample for Python 3.8 and higher:

import xml.etree.ElementTree as ElementTree

tree = ElementTree.ElementTree(ElementTree.fromstring('<xml><test>123</test></xml>'))
root = tree.getroot()
print(ElementTree.tostring(root, encoding='unicode', method='xml', xml_declaration=True))
Kyrylo Malakhov
  • 1,256
  • 12
  • 13
2

The minimal working example with ElementTree package usage:

import xml.etree.ElementTree as ET

document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
node.text = '1'
res = ET.tostring(document, encoding='utf8', method='xml').decode()
print(res)

the output is:

<?xml version='1.0' encoding='utf8'?>
<outer><inner>1</inner></outer>
Andriy
  • 1,270
  • 3
  • 17
  • 35
  • 4
    Unfortunately utf8' isn't valid XML, but 'UTF-8' is https://docs.python.org/3.8/library/xml.etree.elementtree.html#id6 – airstrike Aug 18 '19 at 01:44
1

Another pretty simple option is to concatenate the desired header to the string of xml like this:

xml = (bytes('<?xml version="1.0" encoding="UTF-8"?>\n', encoding='utf-8') + ET.tostring(root))
xml = xml.decode('utf-8')
with open('invoice.xml', 'w+') as f:
    f.write(xml)
Novak
  • 2,143
  • 1
  • 12
  • 22
0

I would use ET:

try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    try:
        # Python 2.5
        import xml.etree.cElementTree as etree
        print("running with cElementTree on Python 2.5+")
    except ImportError:
        try:
            # Python 2.5
            import xml.etree.ElementTree as etree
            print("running with ElementTree on Python 2.5+")
        except ImportError:
            try:
                # normal cElementTree install
                import cElementTree as etree
                print("running with cElementTree")
            except ImportError:
               try:
                   # normal ElementTree install
                   import elementtree.ElementTree as etree
                   print("running with ElementTree")
               except ImportError:
                   print("Failed to import ElementTree from any known place")

document = etree.Element('outer')
node = etree.SubElement(document, 'inner')
print(etree.tostring(document, encoding='UTF-8', xml_declaration=True))
Alessandro
  • 38
  • 6
0

This works if you just want to print. Getting an error when I try to send it to a file...

import xml.dom.minidom as minidom
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement, Comment, tostring

def prettify(elem):
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")
0

Including 'standalone' in the declaration

I didn't found any alternative for adding the standalone argument in the documentation so I adapted the ET.tosting function to take it as an argument.

from xml.etree import ElementTree as ET

# Sample
document = ET.Element('outer')
node = ET.SubElement(document, 'inner')
et = ET.ElementTree(document)

 # Function that you need   
 def tostring(element, declaration, encoding=None, method=None,):
     class dummy:
         pass
     data = []
     data.append(declaration+"\n")
     file = dummy()
     file.write = data.append
     ET.ElementTree(element).write(file, encoding, method=method)
     return "".join(data)
# Working example
xdec = """<?xml version="1.0" encoding="UTF-8" standalone="no" ?>"""    
xml = tostring(document, encoding='utf-8', declaration=xdec)
G M
  • 20,759
  • 10
  • 81
  • 84