65

Possible Duplicate:
Converting XML to JSON using Python?

I'm doing some work on App Engine and I need to convert an XML document being retrieved from a remote server into an equivalent JSON object.

I'm using xml.dom.minidom to parse the XML data being returned by urlfetch. I'm also trying to use django.utils.simplejson to convert the parsed XML document into JSON. I'm completely at a loss as to how to hook the two together. Below is the code I'm tinkering with:

from xml.dom import minidom
from django.utils import simplejson as json

#pseudo code that returns actual xml data as a string from remote server. 
result = urlfetch.fetch(url,'','get');

dom = minidom.parseString(result.content)
json = simplejson.load(dom)

self.response.out.write(json)
Community
  • 1
  • 1
Geuis
  • 41,122
  • 56
  • 157
  • 219

7 Answers7

85

xmltodict (full disclosure: I wrote it) can help you convert your XML to a dict+list+string structure, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.

Once you have that data structure, you can serialize it to JSON:

import xmltodict, json

o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'
Martin Blech
  • 13,135
  • 6
  • 31
  • 35
  • 4
    Have you written the inverse? I'd be interested in such an animal, I think. – Mattie May 02 '12 at 19:26
  • I haven't, but it doesn't sound too hard to do. I woudn't know what to do with "semi structured" xml, though. Stuff like `text c moretext` -> `{'a': {'#text': 'text moretext', 'b': 'c'}}` -> now what? – Martin Blech May 09 '12 at 00:19
  • 1
    If you go strictly by Goessner's article, you actually should have had `{'a': 'text c moretext'}`, which then roundtrips to `text <b>c</b> moretext`... the mismatch between JSON and XML makes situations like this very awkward. I took a stab at the whole thing using `ElementTree` anyway, though, for API work I'm doing. https://github.com/zigg/xon – Mattie May 09 '12 at 01:03
  • 1
    @MartinBlech xmltodict works perfectly for my ebay rss reader project, thank you! – Haqa Sep 16 '12 at 23:31
  • 6
    xmltodict appears to have an "unparse" method that will do the inverse now – Rob Dennis May 06 '14 at 17:22
  • there is some problem with the libexpat library, it can't handle full-width bracket in the element name. – Wei Qiu Feb 13 '18 at 10:00
27

Soviut's advice for lxml objectify is good. With a specially subclassed simplejson, you can turn an lxml objectify result into json.

import simplejson as json
import lxml

class objectJSONEncoder(json.JSONEncoder):
  """A specialized JSON encoder that can handle simple lxml objectify types
      >>> from lxml import objectify
      >>> obj = objectify.fromstring("<Book><price>1.50</price><author>W. Shakespeare</author></Book>")       
      >>> objectJSONEncoder().encode(obj)
      '{"price": 1.5, "author": "W. Shakespeare"}'       
 """


    def default(self,o):
        if isinstance(o, lxml.objectify.IntElement):
            return int(o)
        if isinstance(o, lxml.objectify.NumberElement) or isinstance(o, lxml.objectify.FloatElement):
            return float(o)
        if isinstance(o, lxml.objectify.ObjectifiedDataElement):
            return str(o)
        if hasattr(o, '__dict__'):
            #For objects with a __dict__, return the encoding of the __dict__
            return o.__dict__
        return json.JSONEncoder.default(self, o)

See the docstring for example of usage, essentially you pass the result of lxml objectify to the encode method of an instance of objectJSONEncoder

Note that Koen's point is very valid here, the solution above only works for simply nested xml and doesn't include the name of root elements. This could be fixed.

I've included this class in a gist here: http://gist.github.com/345559

Morishiri
  • 874
  • 10
  • 23
Anton I. Sipos
  • 3,493
  • 3
  • 27
  • 26
15

I think the XML format can be so diverse that it's impossible to write a code that could do this without a very strict defined XML format. Here is what I mean:

<persons>
    <person>
        <name>Koen Bok</name>
        <age>26</age>
    </person>
    <person>
        <name>Plutor Heidepeen</name>
        <age>33</age>
    </person>
</persons>

Would become

{'persons': [
    {'name': 'Koen Bok', 'age': 26},
    {'name': 'Plutor Heidepeen', 'age': 33}]
}

But what would this be:

<persons>
    <person name="Koen Bok">
        <locations name="defaults">
            <location long=123 lat=384 />
        </locations>
    </person>
</persons>

See what I mean?

Edit: just found this article: http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html

Koen Bok
  • 3,234
  • 3
  • 29
  • 42
  • 1
    the article you found is for javascript – George Godik Jan 19 '10 at 20:33
  • 10
    @George The article is about the general issues of roundtripping XML and JSON. It's very germane to this topic. Don't let the JavaScript code at the end throw you off. – Mattie May 02 '12 at 19:29
  • 1
    @Koen, will look like this. { "persons": { "person": { "-name": "Koen Bok", "locations": { "-name": "defaults", "location": { "-long": "123", "-lat": "384" } } } } } – Hugo Prudente Mar 09 '16 at 13:02
8

Jacob Smullyan wrote a utility called pesterfish which uses effbot's ElementTree to convert XML to JSON.

Jeff Bauer
  • 13,890
  • 9
  • 51
  • 73
  • 1
    Installing with pip seems to be broken: `UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte` – smci Aug 19 '14 at 14:55
4

One possibility would be to use Objectify or ElementTree from the lxml module. An older version ElementTree is also available in the python xml.etree module as well. Either of these will get your xml converted to Python objects which you can then use simplejson to serialize the object to JSON.

While this may seem like a painful intermediate step, it starts making more sense when you're dealing with both XML and normal Python objects.

Soviut
  • 88,194
  • 49
  • 192
  • 260
1

I wrote a small command-line based Python script based on pesterfesh that does exactly this:

https://github.com/hay/xml2json

Husky
  • 5,757
  • 2
  • 46
  • 41
  • This link is broken. It would be nice if it were updated (if the script still exists). – Alison R. Apr 06 '11 at 19:18
  • Woops, i moved the script into its own repository. Thanks for noticing the 404! – Husky Apr 06 '11 at 22:25
  • Thanks, this is perfect. One thing to note is that it needs simplejson to be installed: sudo easy_install simplejson – Suman Apr 12 '12 at 21:30
  • @srs2012 I haven't tried this specific script, but I noticed that `pesterfish` imports `simplejson` specifically when it appears it would do just fine with the standard library `json`. See http://stackoverflow.com/a/712799/722332 for more on this. – Mattie May 02 '12 at 19:24
  • -1 Your code doesn't work. a) It ignores attributes of tags: `xml2json.xml2json(' ', no_options)` drops the href attributes and gives only `'{"a": {"b": null}}'` – smci Aug 19 '14 at 14:39
  • b) It bizarrely requires a compulsory 2nd arg which must be a Python object and must have a `pretty` attribute, even if that is False: `options = type("anonobj", (object,), dict(pretty=False))`. Instead of defaulting `options={}` and using `options.getkey('pretty',default=False)` – smci Aug 19 '14 at 14:45
  • c) And it returns the string from json.dumps, not the actual json object. We might actually want the json object instead. You could add an optional arg `to_string=False`. – smci Aug 19 '14 at 14:47
1

In general, you want to go from XML to regular objects of your language (since there are usually reasonable tools to do this, and it's the harder conversion). And then from Plain Old Object produce JSON -- there are tools for this, too, and it's a quite simple serialization (since JSON is "Object Notation", natural fit for serializing objects). I assume Python has its set of tools.

StaxMan
  • 113,358
  • 34
  • 211
  • 239