1

I am having a problem with python xmltodict. Following the near-consensus recommendation here, I tried xmltodict and liked it very much until I had to access attributes at the top level of my handler. I'm probably doing something wrong but it's not clear to me what. I have an xml document looking something like this

<api>
<cons id="79550" modified_dt="1526652449">
<firstname>Mackenzie</firstname>
...
</cons>
<cons id="79551" modified_dt="1526652549">
<firstname>Joe</firstname>
...
</cons>
<api>

I parse it with this:

xmltodict.parse(apiResult.body, item_depth=2, item_callback=handler, xml_attribs=True)

where apiResult.body contains the xml shown above. But, in spite of the xml_attribs=True, I see no @id or @modified_dt in the output after parsing in the handler, although all the elements in the original do appear.

The handler is coded as follows:

def handler(_, cons):
    print (cons)
    mc = MatchChecker(cons)
    mc.check()
    return True

What might I be doing wrong?

I've also tried xmljson and instantly don't like it as well as xmltodict, if only I had the way around this issue. Does anyone have a solution to this problem or a package that would handle this better?

Steve Cohen
  • 4,679
  • 9
  • 51
  • 89

1 Answers1

1

xmltodict works just fine, but you are parsing the argument item_depth=2 which means your handler will only see the elements inside the <cons> elements rather than the <cons> element itself.

xml = """
<api>
<cons id="79550" modified_dt="1526652449">
<firstname>Mackenzie</firstname>
</cons>
</api>
"""

def handler(_,arg):
    for i in arg.items():
        print(i)
    return True

xmltodict.parse(xml, item_depth=2, item_callback=handler, xml_attribs=True)

Prints ('firstname', 'Mackenzie') as expected.

Whereas:

xmltodict.parse(xml, item_depth=1, item_callback=handler, xml_attribs=True)

Prints ('cons', OrderedDict([('@id', '79550'), ('@modified_dt', '1526652449'), ('firstname', 'Mackenzie')])), again as expected.

isedev
  • 18,848
  • 3
  • 60
  • 59
  • Thank you! That makes sense, I guess, but with the depth of 2, I was able to access the firstname field as d["firstname"], as I expected to, but with a depth of 1, will I have to access it as d["cons"]["firstname"]? – Steve Cohen May 18 '18 at 22:10
  • @SteveCohen yes, that's right but it then allows to access the `id` for example as `d["cons"]["@id"]`. – isedev May 18 '18 at 22:15
  • No, this still doesn't make sense to me. depth 1 returns an ARRAY of depth 2 elements corresponding to the cons element in the xml. The id is logically associated with the depth 2 element. When I work at depth 1 even if I use the d["cons']['firstname'] syntax, I get the "list indices must be integers, not str" error. I suppose I could iterate through the list in the handler, but why should I have to? Depth 2 is what I want and I still don't understand why the attributes don't show up. – Steve Cohen May 18 '18 at 23:00
  • I think this is a bug in xmltodict. I think from a look at its code, all depth levels below the specified depth automatically get the xml_attribs set true (whether asked for or not), but at the requested depth, xml_attribs is False. That is why using a depth level of 1 in my case gets the attributes. In my case the attributes belong at depth level 2. – Steve Cohen May 18 '18 at 23:25