I have an XML
file similar to this:
<root>
<a>Some <b>bad</b> text <i>that</i> I <u>do <i>not</i></u> want to keep.</a>
</root>
I want to remove all text in <b>
or <u>
elements (and descendants), and print the rest. This is what I tried:
from __future__ import print_function
import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()
parent_map = {c:p for p in root.iter() for c in p}
for item in root.findall('.//b'):
parent_map[item].remove(item)
for item in root.findall('.//u'):
parent_map[item].remove(item)
print(''.join(root.itertext()).strip())
(I used the recipe in this answer to build the parent_map
). The problem, of course, is that with remove(item)
I'm also removing the text after the element, and the result is:
Some that I
whereas what I want is:
Some text that I want to keep.
Is there any solution?