How can I use xmltodict to get items out of an XML file?

Question

I am trying to easily access values from an XML file like this:

<artikelen>
    <artikel nummer="121">
        <code>ABC123</code>
        <naam>Highlight pen</naam>
        <voorraad>231</voorraad>
        <prijs>0.56</prijs>
    </artikel>
    <artikel nummer="123">
        <code>PQR678</code>
        <naam>Nietmachine</naam>
        <voorraad>587</voorraad>
        <prijs>9.99</prijs>
    </artikel>
..... etc

How would I go about getting access to the value ABC123?

import xmltodict

with open('8_1.html') as fd:
    doc = xmltodict.parse(fd.read())
    print(doc[fd]['code'])

score 36 · Answer 1 · edited Aug 15 '23 at 16:45

Using your example:

import xmltodict

with open('artikelen.xml') as fd:
    doc = xmltodict.parse(fd.read())

If you examine doc, you'll see it's an OrderedDict, ordered by tag:

>>> doc
OrderedDict([('artikelen',
              OrderedDict([('artikel',
                            [OrderedDict([('@nummer', '121'),
                                          ('code', 'ABC123'),
                                          ('naam', 'Highlight pen'),
                                          ('voorraad', '231'),
                                          ('prijs', '0.56')]),
                             OrderedDict([('@nummer', '123'),
                                          ('code', 'PQR678'),
                                          ('naam', 'Nietmachine'),
                                          ('voorraad', '587'),
                                          ('prijs', '9.99')])])]))])

The root node is called artikelen, and there is a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do:

codes = []
for artikel in doc['artikelen']['artikel']:
    codes.append(artikel['code'])

# >>> codes
# ['ABC123', 'PQR678']

If you specifically want the code only when nummer is 121, you could do this:

code = None
for artikel in doc['artikelen']['artikel']:
    if artikel['@nummer'] == '121':
        code = artikel['code']
        break

That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.

Chaitanya Sama · Answer 2 · 2016-10-20T14:59:43.770

-1

This is using xml.etree You can try this:

for artikelobj in root.findall('artikel'):
    print artikelobj.find('code')

if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:

for artikelobj in root.findall('artikel'):
    if artikel.get('nummer') == 121:
        print artikelobj.find('code')

this will print only the code you want.

edited Oct 20 '16 at 14:59

answered Oct 20 '16 at 14:51

Chaitanya Sama

330
3
13

score -2 · Answer 3 · answered Oct 20 '16 at 15:35

You can use lxml package using XPath Expression.

from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code

# ABC123

The thing to notice here is the expression. /artikelen is the root element. /artikel[1] chooses the first artikel element under root(Notice first element is not at index 0). /code is the child element under artikel[1]. You can read more about at lxml and xpath syntax.

score -3 · Answer 4 · answered Oct 20 '16 at 12:43

-3

To read .xml files :

import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text

answered Oct 20 '16 at 12:43

Chr

875
1
10
27

1

In your example : `result = root.artikel.code.text` – Chr Oct 20 '16 at 12:43
the import is not correct because python gives an error – 54m Oct 20 '16 at 13:25
1

What's the error message ? Did you install lxml package ? – Chr Oct 20 '16 at 13:28

How can I use xmltodict to get items out of an XML file?

4 Answers4

Linked