1

The title says it all. I want to search for all element, say Node1, whose "Hello World!" appears in its text, or any of its decedent. If the string appears in a descendant, I still want to get Node1, and not that descendant.

<?xml version="1.0"?>
<data>
    <Node1 id="Node1">Hello World! from Node1
    </Node1>

    <Node1 id="Node2">Nothing to see here
    </Node1>

    <Node1 id="Node3">
        Some text goes here
        <Node2>
            More text
            <Node3>Hellow World! from Node3 </Node3>
        </Node2>
    </Node1>
</data>
sean
  • 1,632
  • 2
  • 15
  • 34

2 Answers2

0

With ElementTree I think you could do something like

import sys
import xml.etree.ElementTree as etree

s = """<root>
<element>A</element>
  <element2>C</element2>
    <element3>TEST</element3>
<element>B</element>
  <element2>D</element2>
    <element3>Test</element3>
</root>"""

e = etree.fromstring(s)

found = [element for element in e.iter() if element.text == 'Test']

print(found[0])

Returns:

<Element 'element3' at 0x7f9edb7e7a98>

Reference:

Physicing
  • 532
  • 5
  • 17
  • Thanks. But this approach only returns the immediate node, e.g. Node3 in my example, instead of the Node1 that I want to search for. – sean Oct 02 '19 at 22:01
0

See below

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0"?>
<data>
    <Node1 id="Node1">Hello World! from Node1
    </Node1>

    <Node1 id="Node2">Nothing to see here
    </Node1>

    <Node1 id="Node3">
        Some text goes here
        <Node2>
            More text
            <Node3>Hello World! from Node3 </Node3>
        </Node2>
    </Node1>
</data>'''


def scan_node(node, txt, result):
    """
    Scan the node (recursively) and look for the text 'txt'
    :param node:
    :param txt:
    :return:
    """
    children = list(node)
    for child in children:
        if txt in child.text:
            result.append(child)
        scan_node(child, txt, result)


root = ET.fromstring(xml)
result = []
scan_node(root, 'Hello World', result)
print(result)

output

[<Element 'Node1' at 0x00723A80>, <Element 'Node3' at 0x00723C30>]
balderman
  • 22,927
  • 7
  • 34
  • 52