181

I have a program that reads an XML document from a socket. I have the XML document stored in a string which I would like to convert directly to a Python dictionary, the same way it is done in Django's simplejson library.

Take as an example:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

Then dic_xml would look like {'person' : { 'name' : 'john', 'age' : 20 } }

Syscall
  • 19,327
  • 10
  • 37
  • 52
user361526
  • 3,333
  • 5
  • 25
  • 36

20 Answers20

389

xmltodict (full disclosure: I wrote it) does exactly that:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}
Martin Blech
  • 13,135
  • 6
  • 31
  • 35
  • 4
    also, for future googlenauts - I was able to use this in App Engine, which I had been lead to believe didn't play nicely with most xml libraries in Python. – LRE Mar 07 '13 at 17:14
  • Thanks it works well. But why is there always the "u" before string ? How to vanish it ? – Samoht Mar 28 '13 at 09:05
  • 2
    The u is just indicating it's stored unicode string. It doesn't affect the value of the string in any way. – Joshua Olson Sep 11 '13 at 22:49
  • Nice. Is there a reverse (dict to xml) function or module? – ypercubeᵀᴹ Mar 06 '14 at 15:13
  • Can you please tell me how to check for a certain key without exception?? if a key doesnt exist, this OrdererdDict put error. – A.J. Mar 07 '14 at 08:00
  • 3
    Nice. And yes, @ypercube, there is a xmldict.unparse() function for the reverse. – Duther Sep 25 '14 at 12:07
  • This might be obvious to some, not to others but anybody writing SOAP definitions, it's helpful to use this xmltodict module with pprint (pretty print). Just `from pprint import pprint` then `pprint(xmltodict.parse('''your XML'''))` – Jarad Sep 18 '15 at 05:01
  • You might want to add `import xmltodict` so taht one can copy-paste it. – Martin Thoma Mar 14 '16 at 11:09
  • when I try to run `xmltodict.parse("file.xml")` I get `xml.parsers.expat.ExpatError: syntax error: line 1, column 0`, any ideas what is going on? – user5359531 Jun 08 '17 at 00:26
  • @user5359531 I think the `parse` expects a string (or stream), not a filename – Maarten Fabré Jul 16 '17 at 11:43
  • 2
    This is excellent. I'm at a loss to express how little I like XML. I wish that I had found this years ago (instead of ETree, XPATH, and that awful mess). As an aside which may help others, I didn't realize that one cannot pprint.pprint() an OrderedDict (this is the result of xmltodict.parse()). I used json.loads(json.dumps("my-XML-string-object")) to get pprint.pprint() to work. Again, THANK YOU! – user2460464 Feb 28 '18 at 23:06
  • In some complex cases, it does not traverse the whole XML. My case was XML reply from Uniprot --but xmlschema worked though. So _caveat emptor_. – Matteo Ferla Sep 14 '18 at 16:00
  • Just a heads up, `xmltodict` and I think most solutions in this page do not parse external entities. – Marcos Dione Oct 24 '19 at 08:04
  • Great. This also works on `QFile` objects using `PyQt5` – Swedgin Apr 16 '20 at 13:19
  • Thank you very much, your module changes my life. Finally I will not have to deal anymore with XML and ElementTree – Peter Apr 30 '20 at 12:40
  • 1
    Error: `xml.parsers.expat.ExpatError: XML or text declaration not at start of entity` It works when I remove the line return before the xml tag ` – btt Jun 03 '20 at 09:54
  • @btt, same error here: Copying your code I get `XML or text declaration not at start of entity: line 2, column 0`. Your Solution is [here](https://stackoverflow.com/questions/6474741/error-error-parsing-xml-xml-or-text-declaration-not-at-start-of-entity) - Lesson learned: **do not** copy that much – Timo Jun 24 '21 at 19:43
  • 2
    Is this module still active in 2022? 90 issues including security ones on GitHub, no updates in about 2 years or so... – shearn89 Apr 25 '22 at 08:38
  • If you want it as a plain dictionary instead of an OrderedDict, just type cast it `dict(xmltodict.parse(...)` – Loner Apr 14 '23 at 10:24
83

This is a great module that someone created. I've used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)
Mazyod
  • 22,319
  • 10
  • 92
  • 157
James
  • 9,694
  • 5
  • 32
  • 38
  • 4
    U can use 'xmltodict' alternatively – mrash May 11 '15 at 15:01
  • 8
    I tried this and it's much faster than xmltodict. For parsing an 80MB xml file it took 7s, with xmltodict it took 90s – Eddy Oct 16 '15 at 21:08
  • 1
    Confirmed... I have not tested this against every edge case but for my rather uncomplicated XML strings, this is pretty fast (about 8 times faster than the `xmltodict` library). Disadvantage is that you have to host it yourself within your project. – Dirk Apr 18 '16 at 09:59
  • it seems that this code can't deal with array as following: text text text text text text text text – zhanglistar Jul 25 '16 at 09:56
  • 12
    Hi there, this works perfect, will add just a snippet for those who can't find `cElementTree`, just change first line to: `from xml.etree import cElementTree as ElementTree` – Rafael Aguilar Sep 13 '16 at 17:14
  • If you have duplicate sub-tags with different attributes, you lose the attributes. For example, I have multiple `` tags, where every `name` is different; this method drops the `name` attribute, making it impossible to distinguish the `Project`'s from each other. – user5359531 Jun 08 '17 at 00:10
  • 2
    Down-voting since there are better answers posted below, particularly in handling multiple tags with the same name. – Maksym Jun 08 '17 at 13:22
  • 2
    on a sidenote, if you don't *need* to use Python and are just trying to import the XML as a structured object for manipulation, I found that it was much easier to just use R for this as per [this](https://stackoverflow.com/q/17198658/5359531) and [this](https://www.tutorialspoint.com/r/r_xml_files.htm). If you just run `library("XML"); result <- xmlParse(file = "file.xml"); xml_data <- xmlToList(result)` you will import your XML as a nested list. Multiple tags with the same name are fine & tag attributes become an extra list item. – user5359531 Jun 08 '17 at 15:52
  • I used xmltodict but gives the error " parser.Parse(xml_input, True) ExpatError: syntax error: line 1, column 0", I have: import xmltodict def handle_artist(_, artist): print(artist['person']) return True xmltodict.parse('activity.xml',item_depth=2, item_callback=handle_artist) . do you know how to fix this error? – Zahra Dec 13 '21 at 21:33
  • I tried it using python 3. The result was wrong for my XML : empty list. I successfully used dictify (Erik Aronesty's solution below). – Eric H. Apr 12 '22 at 09:34
64

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON "specification". It is the most general solution handling all cases of XML.

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

It is used:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

The output of this example (as per above-linked "specification") should be:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)


Update

If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))
K3---rnc
  • 6,717
  • 3
  • 31
  • 46
  • 1
    Thx for this code! Additional info: if you use python 2.5 you can't use dictionary comprehension, so you have to change the line `d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.iteritems()}}` to `d = { t.tag: dict( (k, v[0] if len(v) == 1 else v) for k, v in dd.iteritems() ) }` – M-- Jul 22 '13 at 09:14
  • 2
    I have tested nearly 10 snippets / python modules / etc. for that. This one is the best I have found. According to my tests, it is : 1) much faster than https://github.com/martinblech/xmltodict (based on XML SAX api) 2) better than https://github.com/mcspring/XML2Dict which has some little issues when several children have same names 3) better than http://code.activestate.com/recipes/410469-xml-as-dictionary/ which had small issues as well and more important : 4) much shorter code than all the previous ones! Thanks @K3---rnc – Basj Feb 19 '14 at 13:02
  • This is, by far, the most comprehensive answer, and it works on > 2.6, and its fairly flexible. my only issue is that text can change where it resides depending on whether there's an attribute or not). i posted an even smaller and more rigid solution as well. – Erik Aronesty Jun 18 '15 at 19:25
  • 1
    If you need to get an ordered dict from an XML file, please, you can use this same example with few modifications (see my response below): http://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary-in-python/#32842402 – serfer2 Sep 29 '15 at 11:13
  • This is also pretty nifty and fast when used with `cElementTree` or `lxml.etree`. Note that when using Python 3, all `.iteritems()` have to be changed to `.items()` (same behaviour but the keyword changed from Python 2 to 3). – Dirk Apr 18 '16 at 12:15
  • Beware: high memory usage – fjsj Jul 30 '19 at 20:54
  • The answer that actually works perfectly! – AturSams Mar 23 '21 at 14:28
44

This lightweight version, while not configurable, is pretty easy to tailor as needed, and works in old pythons. Also it is rigid - meaning the results are the same regardless of the existence of attributes.

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

So:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

Results in:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}
Erik Aronesty
  • 11,620
  • 5
  • 64
  • 44
8

Disclaimer: This modified XML parser was inspired by Adam Clark The original XML parser works for most of simple cases. However, it didn't work for some complicated XML files. I debugged the code line by line and finally fixed some issues. If you find some bugs, please let me know. I am glad to fix it.

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)    
Community
  • 1
  • 1
tiger
  • 81
  • 1
  • 4
6

You can do this quite easily with lxml. First install it:

[sudo] pip install lxml

Here is a recursive function I wrote that does the heavy lifting for you:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

The below variant preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

If you want to only return a subtree and convert it to dict, you can use Element.find() to get the subtree and then convert it:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

See the lxml docs here. I hope this helps!

radtek
  • 34,210
  • 11
  • 144
  • 111
6

The most recent versions of the PicklingTools libraries (1.3.0 and 1.3.1) support tools for converting from XML to a Python dict.

The download is available here: PicklingTools 1.3.1

There is quite a bit of documentation for the converters here: the documentation describes in detail all of the decisions and issues that will arise when converting between XML and Python dictionaries (there are a number of edge cases: attributes, lists, anonymous lists, anonymous dicts, eval, etc. that most converters don't handle). In general, though, the converters are easy to use. If an 'example.xml' contains:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

Then to convert it to a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

There are tools for converting in both C++ and Python: the C++ and Python do indentical conversion, but the C++ is about 60x faster

nealmcb
  • 12,479
  • 7
  • 66
  • 91
rts1
  • 71
  • 1
  • 1
  • of course, then if there are 2 a's, this is not a good format. – Erik Aronesty Jun 18 '15 at 19:21
  • 1
    Looks interesting, but I have not yet figured out how the PicklingTools are meant to be used - is this just a tarball of source code files from which I have to find the right ones for my job and then copy them into my project? No modules to load or anything simpler? – Dirk Apr 18 '16 at 08:22
  • I get: in _peekIntoNextNWSChar c = self.is_.read(1) AttributeError: 'str' object has no attribute 'read' – sqp_125 Nov 27 '19 at 12:39
5

I wrote a simple recursive function to do the job:

from xml.etree import ElementTree
root = ElementTree.XML(xml_to_convert)

def xml_to_dict_recursive(root):

    if len(root.getchildren()) == 0:
        return {root.tag:root.text}
    else:
        return {root.tag:list(map(xml_to_dict_recursive, root.getchildren()))}
firelion.cis
  • 51
  • 1
  • 2
3
def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}
dibrovsd
  • 49
  • 1
3

The code from http://code.activestate.com/recipes/410469-xml-as-dictionary/ works well, but if there are multiple elements that are the same at a given place in the hierarchy it just overrides them.

I added a shim between that looks to see if the element already exists before self.update(). If so, pops the existing entry and creates a lists out of the existing and the new. Any subsequent duplicates are added to the list.

Not sure if this can be handled more gracefully, but it works:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)
Adam Clark
  • 183
  • 1
  • 2
  • 8
2

@dibrovsd: Solution will not work if the xml have more than one tag with same name

On your line of thought, I have modified the code a bit and written it for general node instead of root:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d
pg2455
  • 5,039
  • 14
  • 51
  • 78
2

From @K3---rnc response (the best for me) I've added a small modifications to get an OrderedDict from an XML text (some times order matters):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

Following @K3---rnc example, you can use it:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

Hope it helps ;)

Community
  • 1
  • 1
serfer2
  • 2,573
  • 1
  • 23
  • 17
2

An alternative (builds a lists for the same tags in hierarchy):

from xml.etree import cElementTree as ElementTree

def xml_to_dict(xml, result):
    for child in xml:
        if len(child) == 0:
            result[child.tag] = child.text
        else:
            if child.tag in result:
                if not isinstance(result[child.tag], list):
                    result[child.tag] = [result[child.tag]]
                result[child.tag].append(xml_to_dict(child, {}))
            else:
                result[child.tag] = xml_to_dict(child, {})
    return result

xmlTree = ElementTree.parse('my_file.xml')
xmlRoot = xmlTree.getroot()
dictRoot = xml_to_dict(xmlRoot, {})
result = {xmlRoot.tag: dictRoot}

1

Here's a link to an ActiveState solution - and the code in case it disappears again.

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2
Jamie Bull
  • 12,889
  • 15
  • 77
  • 116
tgray
  • 8,826
  • 5
  • 36
  • 41
1

Updated method posted by firelion.cis (since getchildren is deprecated):

from xml.etree import ElementTree
root = ElementTree.XML(xml_to_convert)

def xml_to_dict_recursive(root):

    if len(list(root)) == 0:
        return {root.tag:root.text}
    else:
        return {root.tag:list(map(xml_to_dict_recursive, list(root)))}
fvg
  • 153
  • 3
  • 9
0

I have modified one of the answers to my taste and to work with multiple values with the same tag for example consider the following xml code saved in XML.xml file

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

and in python

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

the output is

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}
coder
  • 700
  • 8
  • 12
0
import xml.etree.ElementTree as ET
root = ET.parse(xml_filepath).getroot()

def parse_xml(node):
    ans = {}
    for child in node:
        if len(child) == 0:
            ans[child.tag] = child.text
        elif child.tag not in ans:
            ans[child.tag] = parse_xml(child)
        elif not isinstance(ans[child.tag], list):
            ans[child.tag] = [ans[child.tag]]
            ans[child.tag].append(parse_xml(child))
        else:
            ans[child.tag].append(parse_xml(child))
    return ans

it merges same field into list and squeezes fields containing one child.

0

Slightly improved version of fvg's fix of firelion.cis's answer. The function is simple, and works for simple XML, and avoids the innermost singleton dictionaries. NOT suitable for complex XML with tags, or if the XML have more than one tag with same name.

from xml.etree import ElementTree

# Replace xml_to_convert below
root = ElementTree.XML(xml_to_convert)

def xml_to_dict(root):
    if len(root):
        return {root.tag:{k:v for d in map(xml_to_dict, root)
                              for k,v in d.items() }}
    else:
        return {root.tag:root.text}

Sample XML:

<student>
    <FirstName>SMITH</FirstName>
    <LastName>JAMES</LastName>
    <fees>
        <Amount>2400</Amount>
        <Currency>USD</Currency>
    </fees>
</student>

Output (formatted):

{'student': {'FirstName': 'SMITH',
             'LastName': 'JAMES',
             'fees': {'Amount': '2400', 
                      'Currency': 'USD'}
            }
}
0

At one point I had to parse and write XML that only consisted of elements without attributes so a 1:1 mapping from XML to dict was possible easily. This is what I came up with in case someone else also doesnt need attributes:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))
josch
  • 6,716
  • 3
  • 41
  • 49
-3

I have a recursive method to get a dictionary from a lxml element

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))
moylop260
  • 1,288
  • 2
  • 13
  • 20