11

I'm trying to make a dict class to process an xml but get stuck, I really run out of ideas. If someone could guide on this subject would be great.

code developed so far:

class XMLResponse(dict):
    def __init__(self, xml):
        self.result = True
        self.message = ''
        pass

    def __setattr__(self, name, val):
        self[name] = val

    def __getattr__(self, name):
        if name in self:
            return self[name]
        return None

message="<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"
XMLResponse(message)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Alfredo Solís
  • 458
  • 2
  • 7
  • 15
  • It appears this question has been answered before: http://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary-in-python – robjohncox Jun 18 '13 at 19:24
  • 1
    What is your desired output? – alecxe Jun 18 '13 at 19:24
  • @Josh I do not understand your idea friend – Alfredo Solís Jun 18 '13 at 19:26
  • @robjohncox find the solution there before but I had no positive – Alfredo Solís Jun 18 '13 at 19:28
  • @alecxe something like this: {"to":"Tove", "from":"Jani", "heading":"Reminder", "body":"Don't forget me this weekend!"} – Alfredo Solís Jun 18 '13 at 19:29
  • @funktasmas: So you don't want `{"note": {"to":"Tove", "from":"Jani", "heading":"Reminder", "body":"Don't forget me this weekend!"}}`? Do you always want the `` node, or do you always want one level below the top, or do you always want the leaves only, or…? – abarnert Jun 18 '13 at 19:47
  • @abarnert not always want the node "note" at other times I can require another node – Alfredo Solís Jun 18 '13 at 19:49
  • @funktasmas: Does that mean you _do_ want the larger dict (with `"note"` mapped to the inner dict)? Or that you want different things at different times and can't describe what you actually want? – abarnert Jun 18 '13 at 19:50
  • @abarnert to let my idea is clear With a xml into a dict, with which to take the information you need at the time, either note, name, etc. – Alfredo Solís Jun 18 '13 at 19:54
  • @funktasmas: I don't understand your answer. Do you want the larger dict from my comment, or not? If not, what is the rule used to get the smaller dict from your comment? – abarnert Jun 18 '13 at 20:00
  • possible duplicate of [How to convert XML to Dict](http://stackoverflow.com/questions/3852968/how-to-convert-xml-to-dict) – Mirzhan Irkegulov Feb 19 '15 at 00:49

4 Answers4

21

You can make use of xmltodict module:

import xmltodict

message = """<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"""
print xmltodict.parse(message)['note']

which produces an OrderedDict:

OrderedDict([(u'to', u'Tove'), (u'from', u'Jani'), (u'heading', u'Reminder'), (u'body', u"Don't forget me this weekend!")])

which can be converted to dict if order doesn't matter:

print dict(xmltodict.parse(message)['note'])

Prints:

{u'body': u"Don't forget me this weekend!", u'to': u'Tove', u'from': u'Jani', u'heading': u'Reminder'}
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • thanks for the help I really appreciate it, but I still think or rather looking at how it performed without an additional module, I'll try anyway. – Alfredo Solís Jun 18 '13 at 19:44
  • 2
    @funktasmas: If you want to see how to do it without an additional module, why not look at the source for `xmltodict`? It's a couple hundred lines of clean, well-commented Python code. And it's certainly going to be better than any quick & dirty hack someone comes up with for an answer on SO. – abarnert Jun 18 '13 at 19:49
  • @abarnert was just seeing how they develop the module, maybe it's a good way to start. – Alfredo Solís Jun 18 '13 at 19:51
  • @funktasmas: Given the licensing and dev history of the module, probably the best way to start is to just fork it and start playing with your fork. That way, if you come up with anything that was missing in the original, and that you want to share with the world, you can just submit a pull request back upstream. – abarnert Jun 18 '13 at 20:03
  • @abarnert if I have some code developed for this I have no problem sharing any information with the community. – Alfredo Solís Jun 18 '13 at 20:25
7

You'd think that by now we'd have a good answer to this one, but we apparently didn't. After reviewing half of dozen of similar questions on stackoverflow, here is what worked for me:

from lxml import etree
# arrow is an awesome lib for dealing with dates in python
import arrow


# converts an etree to dict, useful to convert xml to dict
def etree2dict(tree):
    root, contents = recursive_dict(tree)
    return {root: contents}


def recursive_dict(element):
    if element.attrib and 'type' in element.attrib and element.attrib['type'] == "array":
        return element.tag, [(dict(map(recursive_dict, child)) or getElementValue(child)) for child in element]
    else:
        return element.tag, dict(map(recursive_dict, element)) or getElementValue(element)


def getElementValue(element):
    if element.text:
        if element.attrib and 'type' in element.attrib:
            attr_type = element.attrib.get('type')
            if attr_type == 'integer':
                return int(element.text.strip())
            if attr_type == 'float':
                return float(element.text.strip())
            if attr_type == 'boolean':
                return element.text.lower().strip() == 'true'
            if attr_type == 'datetime':
                return arrow.get(element.text.strip()).timestamp
        else:
            return element.text
    elif element.attrib:
        if 'nil' in element.attrib:
            return None
        else:
            return element.attrib
    else:
        return None

and this is how you use it:

from lxml import etree

message="""<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"''
tree = etree.fromstring(message)
etree2dict(tree)

Hope it helps :-)

Fred
  • 635
  • 9
  • 18
  • This worked for me. I had to add `.getroot()` to `tree` in the call, as in `etree2dict(tree.getroot())`. Maybe this is because I read the XML from a file and not from a string? In any case, great answer. – Mabyn Oct 12 '18 at 04:07
6

You should checkout

https://github.com/martinblech/xmltodict

I think it is one of the best standard handlers for xml to dict I have seen.

However I should warn you xml and dict are not absolutely compatible data structures

dusual
  • 2,097
  • 3
  • 19
  • 26
  • thanks for your answer. I guess also not be fully compatible structures, and that there is no solution as fast as in the past. – Alfredo Solís Jun 18 '13 at 19:36
  • @funktasmas: The only big issue for simple cases is that XML nodes can have attributes as well as sub-nodes, and you have to decide how to represent that. `xmltodict` represents attributes as nodes with a `@` prefix on their name, which is one way to solve the problem, but there are other possibilities—e.g., you can handle nodes with `__getitem__` and attrs with `__getattr__`. – abarnert Jun 18 '13 at 20:59
3

You can use the lxml library. Convert the string to an xml object using objectify.fromstring and then look up the objects dir method. For Example:

from lxml import objectify

xml_string = """<?xml version="1.0" encoding="UTF-8"?><NewOrderResp><IndustryType></IndustryType><MessageType>R</MessageType><MerchantID>700000005894</MerchantID><TerminalID>0031</TerminalID><CardBrand>AMEX</CardBrand><AccountNum>3456732800000010</AccountNum><OrderID>TESTORDER1</OrderID><TxRefNum>55A69B278025130CD36B3A95435AA84DC45363</TxRefNum><TxRefIdx>10</TxRefIdx><ProcStatus>0</ProcStatus><ApprovalStatus>1</ApprovalStatus><RespCode></RespCode><AVSRespCode></AVSRespCode><CVV2RespCode></CVV2RespCode><AuthCode></AuthCode><RecurringAdviceCd></RecurringAdviceCd><CAVVRespCode></CAVVRespCode><StatusMsg></StatusMsg><RespMsg></RespMsg><HostRespCode></HostRespCode><HostAVSRespCode></HostAVSRespCode><HostCVV2RespCode></HostCVV2RespCode><CustomerRefNum>A51C5B2B1811E5991208</CustomerRefNum><CustomerName>BOB STEVEN</CustomerName><ProfileProcStatus>0</ProfileProcStatus><CustomerProfileMessage>Profile Created</CustomerProfileMessage><RespTime>13055</RespTime><PartialAuthOccurred></PartialAuthOccurred><RequestedAmount></RequestedAmount><RedeemedAmount></RedeemedAmount><RemainingBalance></RemainingBalance><CountryFraudFilterStatus></CountryFraudFilterStatus><IsoCountryCode></IsoCountryCode></NewOrderResp>"""

xml_object = objectify.fromstring(xml_string)

print xml_object.__dict__

Converting the xml object to dict would return a dict:

{'RemainingBalance': u'', 'AVSRespCode': u'', 'RequestedAmount': u'', 'AccountNum': 3456732800000010, 'IsoCountryCode': u'', 'HostCVV2RespCode': u'', 'TerminalID': 31, 'CVV2RespCode': u'', 'RespMsg': u'', 'CardBrand': 'AMEX', 'MerchantID': 700000005894, 'RespCode': u'', 'ProfileProcStatus': 0, 'CustomerName': 'BOB STEVEN', 'PartialAuthOccurred': u'', 'MessageType': 'R', 'ProcStatus': 0, 'TxRefIdx': 10, 'RecurringAdviceCd': u'', 'IndustryType': u'', 'OrderID': 'TESTORDER1', 'StatusMsg': u'', 'ApprovalStatus': 1, 'RedeemedAmount': u'', 'CountryFraudFilterStatus': u'', 'TxRefNum': '55A69B278025130CD36B3A95435AA84DC45363', 'CustomerRefNum': 'A51C5B2B1811E5991208', 'CustomerProfileMessage': 'Profile Created', 'AuthCode': u'', 'RespTime': 13055, 'HostAVSRespCode': u'', 'CAVVRespCode': u'', 'HostRespCode': u''}

The xml string I used is a response from paymentech payments gateway just to show a real world example.

Also note that the above example is not recursive, so if there is dicts within dicts you have to do some recursion. See the recursive function I wrote that you can use:

from lxml import objectify

def xml_to_dict_recursion(xml_object):
    dict_object = xml_object.__dict__
    if not dict_object:
        return xml_object
    for key, value in dict_object.items():
        dict_object[key] = xml_to_dict_recursion(value)
    return dict_object

def xml_to_dict(xml_str):
    return xml_to_dict_recursion(objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

Heres a variant that preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

And if you want to only return a subtree and convert it to dict, you can use Element.find() :

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

There are many options to accomplish this but this one is great if you're already using lxml. In this example lxml-3.4.2 was used.Cheers!

radtek
  • 34,210
  • 11
  • 144
  • 111
  • If you want a true python dict (not a dict of lxml objects), then update `xml_to_dict_recursion` to be `return xml_object.pyval` – sam-6174 Nov 15 '19 at 20:50