Generating XML in Python

Question

I'm developing an API using Python which makes server calls using XML. I am debating on whether to use a library (ex. http://wiki.python.org/moin/MiniDom) or if it would be "better" (meaning less overhead and faster) to use string concatenation in order to generate the XML used for each request. Also, the XML I will be generating is going to be quite dynamic so I'm not sure if something that allows me to manage elements dynamically will be a benefit.

Why not use JSON? The module is easy to use, it is a standard for web APIs, and is platform independent. — Michael David Watson, May 02 '13 at 21:23
I would love you use JSON. Unfortunately, the API I'm using (Authorize.net) is not quite up on the better web technologies. — Vincent Catalano, May 02 '13 at 21:26

score 2 · Answer 1 · edited May 23 '17 at 11:49

Since you are just using authorize.net, why not use a library specifically designed for the Authorize.net API and forget about constructing your own XML calls?

If you want or need to go your own way with XML, don't use minidom, use something with an ElementTree interface such as cElementTree (which is in the standard library). It will be much less painful and probably much faster. You will certainly need an XML library to parse the XML you produce, so you might as well use the same API for both.

It is very unlikely that the overhead of using an XML library will be a problem, and the benefit in clean code and knowing you can't generate invalid XML is very great.

If you absolutely, positively need to be as fast as possible, use one of the extremely fast templating libraries available for Python. They will probably be much faster than any naive string concatenation you do and will also be safe (i.e do proper escaping).

bigh_29 · Answer 2 · 2017-11-02T10:59:37.987

Another option is to use Jinja, especially if the dynamic nature of your xml is fairly simple. This is a common idiom in flask for generating html responses.

Here is an example of a jinja template that generates the XML of an aws S3 list objects response. I usually store the template in a separate file to avoid polluting my elegant python with ugly xml.

from datetime import datetime
from jinja2 import Template

list_bucket_result = """<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Name>{{bucket_name}}</Name>
    <Prefix/>
    <KeyCount>{{object_count}}</KeyCount>
    <MaxKeys>{{max_keys}}</MaxKeys>
    <IsTruncated>{{is_truncated}}</IsTruncated>
    {%- for object in object_list %}
    <Contents>
        <Key>{{object.key}}</Key>
        <LastModified>{{object.last_modified_date.isoformat()}}</LastModified>
        <ETag></ETag>
        <Size>{{object.size}}</Size>
        <StorageClass>STANDARD</StorageClass>
    </Contents>{% endfor %}
</ListBucketResult>
"""

class BucketObject:
    def __init__(self, key, last_modified_date, size):
        self.key = key
        self.last_modified_date = last_modified_date
        self.size = size

object_list = [
    BucketObject('/foo/bar.txt', datetime.utcnow(), 10*1024 ),
    BucketObject('/foo/baz.txt', datetime.utcnow(), 29*1024 ),
]

template = Template(list_bucket_result)
result = template.render(
    bucket_name='test-bucket',
    object_count=len(object_list),
    max_keys=1000,
    is_truncated=False,
    object_list=object_list
)
print result

Output:

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Name>test-bucket</Name>
    <Prefix/>
    <KeyCount>2</KeyCount>
    <MaxKeys>1000</MaxKeys>
    <IsTruncated>False</IsTruncated>
    <Contents>
        <Key>/foo/bar.txt</Key>
        <LastModified>2017-10-31T02:28:34.551000</LastModified>
        <ETag></ETag>
        <Size>10240</Size>
        <StorageClass>STANDARD</StorageClass>
    </Contents>
    <Contents>
        <Key>/foo/baz.txt</Key>
        <LastModified>2017-10-31T02:28:34.551000</LastModified>
        <ETag></ETag>
        <Size>29696</Size>
        <StorageClass>STANDARD</StorageClass>
    </Contents>
</ListBucketResult>

eandersson · Accepted Answer · 2013-05-02T21:33:11.573

1

I would definitely recommend that you make use of one of the Python libraries; such as MiniDom, ElementTree, lxml.etree or pyxser. There is no reason to not to, and the potential performance impact will be minimal.

Although, personally I prefer using simplejson (or simply json) instead.

my_list = ["Value1", "Value2"]
json = simplejson.dumps(my_list)
# send json

edited May 02 '13 at 21:33

answered May 02 '13 at 21:19

eandersson

25,781
8
89
110

score 1 · Answer 4 · answered May 02 '13 at 21:21

My real question is what are the biggest concerns for what you're trying to accomplish? If you're worried about speed/memory, then yes, minidom does take a hit. If you want something that's fairly reliable that you can deploy quickly, I'd say use it.

My suggestion for dealing with XML in any language(Java, Python, C#, Perl, etc) is to consider using something already existing. Everyone has written their own XML parser at least once, and then they never do so again because it's such a pain in the behind. And to be fair, these libraries will have already fixed 99.5% of any problems you would run into.

score 1 · Answer 5 · answered May 02 '13 at 21:46

I recommend LXML. It's a Python library of bindings for the very fast C libraries libxml2 and libxslt.

LXML supports XPATH, and has an elementTree implementation. LXML also has an interface called objectify for writing XML as object hierarchies:

from lxml import etree, objectify
E = objectify.ElementMaker(annotate=False)

my_alpha = my_alpha = E.alpha(E.beta(E.gamma(firstattr='True')),
                              E.beta(E.delta('text here')))
etree.tostring(my_alpha)
# '<alpha><beta><gamma firstattr="True"/></beta><beta><delta>text here</delta></beta></alpha>'

etree.tostring(my_alpha.beta[0])
# '<beta><gamma firstattr="True"/></beta>'

my_alpha.beta[1].delta.text
# 'text here'

Generating XML in Python

5 Answers5

Linked