2

I am searching for a way to remove a specific tag <e> that has value as mmm within xml file (i.e <e>mmm</e>. I am referring to this thread as staring guide: How to remove elements from XML using Python without using lxml library instead of using ElementTree with python v2.6.6. I was trying to connect a dot with the thread and reading upon ElementTree api doc but I haven't been successful.

I appreciate your advice and thought on this.

<?xml version='1.0' encoding='UTF-8'?>
<parent>
   <first>
     <a>123</a>                              
     <c>987</c>
       <d>
         <e>mmm</e>
         <e>yyy</e>           
       </d>         
   </first>
   <second>
     <a>456</a>                      
     <c>345</c>
       <d>
         <e>mmm</e>
         <e>hhh</e>            
       </d>
   </second>
 </parent>
Queuebee
  • 651
  • 1
  • 6
  • 24
DaeYoung
  • 1,161
  • 6
  • 27
  • 59

2 Answers2

2

It took a while for me to realise all <e> tags are subnodes of <d>.

If we can assume the above is true for all your target nodes (<e> nodes with value mmm), you can use this script. (I added some extra nodes to check if it worked

import xml.etree.ElementTree as ET

xml_string = """<?xml version='1.0' encoding='UTF-8'?>
<parent>
   <first>
     <a>123</a>                              
     <c>987</c>
       <d>
         <e>mmm</e>
         <e>aaa</e>
         <e>mmm</e>
         <e>yyy</e>           
       </d>         
   </first>
   <second>
     <a>456</a>                      
     <c>345</c>
       <d>
         <e>mmm</e>
         <e>hhh</e>            
       </d>
   </second>
 </parent>"""

# this is how I create my root, if you choose to do it in a different way the end of this script might not be useful
root = ET.fromstring(xml_string)

target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'

# find all <d> nodes
for node in root.iter(target_node_first_parent):
    # find <e> subnodes of <d>
    for subnode in node.iter(target_node):
        if subnode.text == target_text:
            node.remove(subnode)

# output the result         
tree = ET.ElementTree(root)
tree.write('output.xml')

I tried to just remove nodes found by root.iter(yourtag) but apparently it's not possible from the root (apparently it was not that easy)

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Queuebee
  • 651
  • 1
  • 6
  • 24
1

The answer by @Queuebee is exactly correct but incase you want to read from a file, the code below provides a way to do that.

import xml.etree.ElementTree as ET

file_loc = " "
xml_tree_obj = ET.parse(file_loc)

xml_roots = xml_tree_obj.getroot()

target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'

# find all <d> nodes
for node in xml_roots.iter(target_node_first_parent):
    # find <e> subnodes of <d>
    for subnode in node.iter(target_node):
        if subnode.text == target_text:
            node.remove(subnode)

out_tree = ET.ElementTree(xml_roots)
out_tree.write('output.xml')
Nana Owusu
  • 96
  • 7