3

I have somehow similar problem like this:

How do I select multiple sets of attributes within an XML document using XPath?

My XML data looks like this:

<?xml version="1.0" encoding="utf-8"?>
<Basic>
    <Segment>
        <Sample value="12" data2="25" data3="23"/>
        <Sample value="13" data2="0" data3="323"/>
        <Sample value="14" data2="2" data3="3"/>
    </Segment>
</Basic>

What's the most simple python way to get those datax values to lists.

For ex: data2 = ['25','0','2']

Community
  • 1
  • 1

3 Answers3

5

With xpath:

from lxml import etree
from collections import defaultdict
from pprint import pprint

doc="""<?xml version="1.0" encoding="utf-8"?>
<Basic>
    <Segment>
        <Sample value="12" data2="25" data3="23"/>
        <Sample value="13" data2="0" data3="323"/>
        <Sample value="14" data2="2" data3="3"/>
    </Segment>
</Basic>
"""
el = etree.fromstring(doc)
data2 = el.xpath('//@data2')
dataX = el.xpath('//@*[starts-with(name(), "data")]')
print data2
print dataX

# With iteration over Sample elements, like in J.F. Sebastian answer, but with XPath
d = defaultdict(list)
for sample in el.xpath('//Sample'):
    for attr_name, attr_value in sample.items():
        d[attr_name].append(attr_value)

pprint(dict(d))

Output:

['25', '0', '2']
['25', '23', '0', '323', '2', '3']
{'data2': ['25', '0', '2'],
 'data3': ['23', '323', '3'],
 'value': ['12', '13', '14']}
reclosedev
  • 9,352
  • 34
  • 51
  • you could use `el.iter('Sample')` instead of `el.xpath('//Sample')`. – jfs Dec 30 '11 at 15:23
  • @J.F.Sebastian, you are right, but by using xpath you can add extra logic in one line, e.g. `el.xpath('//Sample[@value < 14]')` to select only elements with value less than 14, etc. – reclosedev Dec 30 '11 at 15:26
  • Ended up using this. And actually the for-loop part. That's good for me, because xml-file has not always all attributes. The other answer would also work, but I find this more simple. –  Dec 31 '11 at 09:00
1

The simplest way to get attribute values is using etree.Element.get('attr_name'):

from lxml import etree

s = '''<?xml version="1.0" encoding="utf-8"?>
<Basic>
    <Segment>
        <Sample value="12" data2="25" data3="23"/>
        <Sample value="13" data2="0" data3="323"/>
        <Sample value="14" data2="2" data3="3"/>
    </Segment>
</Basic>'''

# ❗️for python2
# tree = etree.fromstring(s)

# ❗️for python3
tree = etree.fromstring(s.encode("utf-8"))

samples = tree.xpath('//Sample')

print([sample.get('data2') for sample in samples])
>>> ['25', '0', '2']
ccpizza
  • 28,968
  • 18
  • 162
  • 169
  • I would really like to use get but i can not get it to work with a namespace attribute. For this example it would be: xml:data2="25". With xpath i get to many or to less results.. Any idea to make get work with namespace? – MisterT Sep 04 '19 at 14:43
  • 1
    @MisterT: You need to get the namespace map first with `ns = tree.getroot().nsmap` and then pass the namespace to your xpath call `tree.xpath(xpath_expression, namespaces=ns)` – ccpizza Sep 04 '19 at 14:59
  • Thank this still pushed me in the right direction. https://stackoverflow.com/questions/31063541/how-to-get-an-attribute-of-an-element-that-is-namespaced. The only downside i see with this is that i have to loop over each child individual instead of just the parant...... – MisterT Sep 06 '19 at 11:20
0

Using cElementTree from stdlib:

import sys
from collections import defaultdict
from xml.etree import cElementTree as etree

d = defaultdict(list)
for ev, el in etree.iterparse(sys.stdin):
    if el.tag == 'Sample':
       for name in "value data2 data3".split():
           d[name].append(el.get(name))
print(d)

Output

{'data2': ['25', '0', '2'],
 'data3': ['23', '323', '3'],
 'value': ['12', '13', '14']}

If you use lxml.etree then you could: etree.iterparse(file, tag='Sample') to select Sample elements in iterparse() i.e., you can drop if el.tag == 'Sample' condition in this case.

jfs
  • 399,953
  • 195
  • 994
  • 1,670