LXML, how to get multiple set of attributes to lists

Question

I have somehow similar problem like this:

How do I select multiple sets of attributes within an XML document using XPath?

My XML data looks like this:

<?xml version="1.0" encoding="utf-8"?>
<Basic>
    <Segment>
        <Sample value="12" data2="25" data3="23"/>
        <Sample value="13" data2="0" data3="323"/>
        <Sample value="14" data2="2" data3="3"/>
    </Segment>
</Basic>

What's the most simple python way to get those datax values to lists.

For ex: data2 = ['25','0','2']

reclosedev · Accepted Answer · 2011-12-30T14:22:40.293

5

With xpath:

from lxml import etree
from collections import defaultdict
from pprint import pprint

doc="""<?xml version="1.0" encoding="utf-8"?>
<Basic>
    <Segment>
        <Sample value="12" data2="25" data3="23"/>
        <Sample value="13" data2="0" data3="323"/>
        <Sample value="14" data2="2" data3="3"/>
    </Segment>
</Basic>
"""
el = etree.fromstring(doc)
data2 = el.xpath('//@data2')
dataX = el.xpath('//@*[starts-with(name(), "data")]')
print data2
print dataX

# With iteration over Sample elements, like in J.F. Sebastian answer, but with XPath
d = defaultdict(list)
for sample in el.xpath('//Sample'):
    for attr_name, attr_value in sample.items():
        d[attr_name].append(attr_value)

pprint(dict(d))

Output:

['25', '0', '2']
['25', '23', '0', '323', '2', '3']
{'data2': ['25', '0', '2'],
 'data3': ['23', '323', '3'],
 'value': ['12', '13', '14']}

edited Dec 30 '11 at 14:22

answered Dec 30 '11 at 13:33

reclosedev

9,352
34
51

you could use `el.iter('Sample')` instead of `el.xpath('//Sample')`. – jfs Dec 30 '11 at 15:23
@J.F.Sebastian, you are right, but by using xpath you can add extra logic in one line, e.g. `el.xpath('//Sample[@value < 14]')` to select only elements with value less than 14, etc. – reclosedev Dec 30 '11 at 15:26
Ended up using this. And actually the for-loop part. That's good for me, because xml-file has not always all attributes. The other answer would also work, but I find this more simple. – Dec 31 '11 at 09:00

ccpizza · Answer 2 · 2018-08-02T10:28:28.260

1

The simplest way to get attribute values is using etree.Element.get('attr_name'):

from lxml import etree

s = '''<?xml version="1.0" encoding="utf-8"?>
<Basic>
    <Segment>
        <Sample value="12" data2="25" data3="23"/>
        <Sample value="13" data2="0" data3="323"/>
        <Sample value="14" data2="2" data3="3"/>
    </Segment>
</Basic>'''

# ❗️for python2
# tree = etree.fromstring(s)

# ❗️for python3
tree = etree.fromstring(s.encode("utf-8"))

samples = tree.xpath('//Sample')

print([sample.get('data2') for sample in samples])
>>> ['25', '0', '2']

edited Aug 02 '18 at 10:28

answered Mar 21 '16 at 21:55

ccpizza

28,968
18
162
169

I would really like to use get but i can not get it to work with a namespace attribute. For this example it would be: xml:data2="25". With xpath i get to many or to less results.. Any idea to make get work with namespace? – MisterT Sep 04 '19 at 14:43
1

@MisterT: You need to get the namespace map first with `ns = tree.getroot().nsmap` and then pass the namespace to your xpath call `tree.xpath(xpath_expression, namespaces=ns)` – ccpizza Sep 04 '19 at 14:59
Thank this still pushed me in the right direction. https://stackoverflow.com/questions/31063541/how-to-get-an-attribute-of-an-element-that-is-namespaced. The only downside i see with this is that i have to loop over each child individual instead of just the parant...... – MisterT Sep 06 '19 at 11:20

score 0 · Answer 3 · answered Dec 30 '11 at 10:27

Using cElementTree from stdlib:

import sys
from collections import defaultdict
from xml.etree import cElementTree as etree

d = defaultdict(list)
for ev, el in etree.iterparse(sys.stdin):
    if el.tag == 'Sample':
       for name in "value data2 data3".split():
           d[name].append(el.get(name))
print(d)

Output

{'data2': ['25', '0', '2'],
 'data3': ['23', '323', '3'],
 'value': ['12', '13', '14']}

If you use lxml.etree then you could: etree.iterparse(file, tag='Sample') to select Sample elements in iterparse() i.e., you can drop if el.tag == 'Sample' condition in this case.

LXML, how to get multiple set of attributes to lists

3 Answers3

Output