3

Suppose this is my XML:

<animals>
   <mammals> 
      <an>dog</an>
      <an>cat</an>
   </mammals>
   <reptiles>
      <an>snake</an>
   </reptiles>
</animals>

What I want is to get tuples like that using xpath:

(mammals,dog)
(mammals,cat)
(reptiles,snake)

To get each of them separately, or both of them with 2 queries is easy. I was wondering if there is a way to get it (or very similar output) in 1 xpath query.

Any help will be appreciated!

Binyamin Even
  • 3,318
  • 1
  • 18
  • 45
  • Here's a lead to follow https://stackoverflow.com/questions/21996965/concatenate-multiple-node-values-in-xpath – LMC Jan 24 '18 at 12:48
  • @LuisMuñoz - in this question the elements are on the same level. Here not. – Binyamin Even Jan 24 '18 at 12:50
  • the first element in the tuple is the tag name, not an element itself and can be obtained with xpath. – LMC Jan 24 '18 at 12:54

4 Answers4

3

Use lxml:

from io import StringIO

from lxml import etree

xml = """<animals>
   <mammals> 
      <an>dog</an>
      <an>cat</an>
   </mammals>
   <reptiles>
      <an>snake</an>
   </reptiles>
</animals>"""

tree = etree.parse(StringIO(xml))

for x in tree.xpath("/animals/*"):
    for y in x:
        print((x.tag, y.text))

Output:

('mammals', 'dog')
('mammals', 'cat')
('reptiles', 'snake')
2

In XPath 2.0 or above you can use for construct (demo) :

for $x in /animals/*/*
return concat($x/parent::*/name(), ',', $x/text())

But in lxml, which only supports XPath 1.0, we need to replace it with python's for loop :

from lxml import etree

raw = """<animals>
   <mammals> 
      <an>dog</an>
      <an>cat</an>
   </mammals>
   <reptiles>
      <an>snake</an>
   </reptiles>
</animals>"""
root = etree.fromstring(raw)

for x in root.xpath("/animals/*/*"):
    print (x.getparent().tag, x.text)
har07
  • 88,338
  • 12
  • 84
  • 137
1

Try using xml module in python

from xml.etree import  ElementTree

def parse_data(xml_str):
    output = []
    tree = ElementTree.fromstring(xml_str)
    for m in tree.getchildren():
        for n in m.getchildren():
           output.append((m.tag, n.text,))
    return output

xml_str = '''
<animals>
   <mammals> 
      <an>dog</an>
      <an>cat</an>
   </mammals>
   <reptiles>
      <an>snake</an>
   </reptiles>
</animals>'''

print parse_data(xml_str)
# output: [('mammals', 'dog'), ('mammals', 'cat'), ('reptiles', 'snake')]
anjaneyulubatta505
  • 10,713
  • 1
  • 52
  • 62
0

This xpath returns the requested string but only for the first element. Could be hard to do with pure XPath

'concat("(", local-name(//animals/*), ",", //animals/*/an/text(), ")")'

xmllint --xpath 'concat("(", local-name(//animals/*), ",", //animals/*/an/text(), ")")' ~/tmp/test.xml
(mammals,dog)
LMC
  • 10,453
  • 2
  • 27
  • 52