0

I have this xml file which contains and inside. Though I am getting only the first one, I cannot loop through them. Here is the xml structure and code:

from lxml import objectify as xml_objectify
contents = open('/home/conacons/Documents/order.xml').read()
def xml_to_dict(xml_str):
""" Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))
xml_dict = xml_to_dict(contents)
#print xml_dict
for item,v in xml_dict['item']['items'].items():
    print item,v
<Order>
<item>
<customer></customer>
<status>no</status>
<amount_untaxed>7315.0</amount_untaxed>
<name>Test/001</name>
<confirmation_date>False</confirmation_date>
<order_id>8</order_id>
<items>
<item><list_price>16.5</list_price><description>False</description><weight>0.0</weight><default_code/><id>18</id><uom>Unit(s)</uom> <name>iPod</name></item><item><list_price>12.5</list_price><description>False</description><weight>0.0</weight><default_code>M-Wir</default_code><id>19</id><uom>Unit(s)</uom><name>Mouse, Wireless</name>     </item>

Whrn i run this code I am getting only one of the ITEMS. How can I make the loop to get all items in items? THanks (output): item {'list_price': 16.5, 'description': 'False', 'weight': 0.0, 'default_code': u'', 'id': 18, 'uom': 'Unit(s)', 'name': 'iPod'}

nepix32
  • 3,012
  • 2
  • 14
  • 29
  • Can you post a valid xml document? This one has some errors. For example there are no closing tags for Order, first "item" tag, etc. – Eduard Stepanov Jun 23 '17 at 07:30
  • Here is the full order.xml doc https://pastebin.com/sUsbRqAz – The Agri CULTURE Guy Jun 23 '17 at 07:33
  • You can either use xml and process it via xml libraries or use json and convert it into dict for processing. Converting xml into dict for processing is usually a bad idea. – marbu Jun 29 '17 at 08:41
  • Isn't this a copy of [https://stackoverflow.com/q/7684333/3147247](https://stackoverflow.com/q/7684333/3147247)? – kafran Aug 12 '20 at 15:02

1 Answers1

1

There is a problem in your approach. XML object is not converted to dict because dict object can't have duplicate keys. For example, in your case when you call xml_object.__dict__ for xml_object with several item children tags it returns a dict with only one item key. So you should use getchildren method instead of __init__. But there is another problem. For xml_object corresponding to items tag from your example the next code also won't work correctly:

for child in xml_object.getchildren():
    dict_object[child.tag] = xml_to_dict_recursion(child)

The reason as you understand is that in all loop iterations child.tag has the same value.

One possible way to resolve these problems is using collections.defaultdict. The code might be look like this:

from collections import defaultdict
from lxml import objectify


def xml_to_dict(xml_str):
    def xml_to_dict_recursion(xml_object):
        dict_object = defaultdict(list)
        if not xml_object.__dict__:
            return xml_object
        for child in xml_object.getchildren():
            dict_object[child.tag].append(xml_to_dict_recursion(child))
        return dict_object
    return xml_to_dict_recursion(objectify.fromstring(xml_str))


if __name__ == "__main__":
    contents = open('input.xml').read()
    xml_dict = xml_to_dict(contents)
    for value in xml_dict['item'][0]['items'][0]['item']:
        print(dict(value))

In this case the output is:

{'uom': ['Unit(s)'], 'default_code': [''], 'description': ['False'], 'name': ['iPod'], 'weight': [0.0], 'list_price': [16.5], 'id': [18]}
{'uom': ['Unit(s)'], 'default_code': ['M-Wir'], 'description': ['False'], 'name': ['Mouse, Wireless'], 'weight': [0.0], 'list_price': [12.5], 'id': [19]}

But in my opinion this approach is not so convinient and more comfortable way is just parsing xml document itself with lxml.objectify (see the docs). For example:

tree = objectify.parse('input.xml')
order = tree.getroot()
order_items = order.getchildren()
for order_item in order_items:
    print(order_item['amount_untaxed'])
    customer = order_item['customer']
    print(customer['item']['city'])
    for item in order_item['items'].getchildren():
        print(item['list_price'])
Eduard Stepanov
  • 1,183
  • 8
  • 9
  • Cool thanks dude this works. Now because this will be a multiple order import system, i will have more than 1 item in and here another . I can access them with ` for value in xml_dict['item'][0]['items'][0]['item']: #print value['list_price'] print(dict(value)) for value in xml_dict['item'][1]['items'][0]['item']: #print value['list_price'] print(dict(value))` But to loop through all of them withou manually writing the xml_dict['item'][1][ or xml_dict['item'][2][ or xml_dict['item'][3][ ? Thanks in advance – The Agri CULTURE Guy Jun 28 '17 at 06:59
  • Do you really need in converting xml object to dictionary? In my opinion working with `lxml` methods is more convenient. – Eduard Stepanov Jun 28 '17 at 07:20
  • okay so in short, how would I access all orders and items in orders without converting the xml to dict? thanks a lot – The Agri CULTURE Guy Jun 28 '17 at 10:28
  • I've updated my comment with example of parsing your xml file. – Eduard Stepanov Jun 29 '17 at 08:26