How To remove an Subement in xml file using python

Question

I want to remove a specific element with all sub element which belongs to him. to find the element I want to remove I'm want to use the id of tag or the name of the tag.

For example, given this etree object

<?xml version="1.0" ?>
<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
    <tag id="29">
      <name>Moon</name>
    </tag>
</tags>
</root>

for example I want to remove Moon with the id "29"

the output I want :

<?xml version="1.0" ?>
<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
 </tags>

</root>

Here is my code :

def remove_tag(root, tag_id_r):
    i = 0
    for tag in root.iter('tag'):
        tag_id = tag.get('id')
        if (tag_id == tag_id_r):
            #root.clear(tag)
            #root.remove(tag)
            #root[1][i].remove(tag)
        # print(i, tag_id, tag_id_r, root[1][i])
        i += 1

def main():
    with open("lib.xml", 'a') as f:
        tree = etree.parse('lib.xml')
        root = tree.getroot()

        remove_tag(root, input("What is the id of the tag you want to remove?"))

        f.seek(0)
        f.truncate()

        dom = minidom.parseString(etree.tostring(tree, encoding="utf-8"))
        print('\n'.join([line for line in dom.toprettyxml(indent=' '*2).split('\n') if line.strip()]), file=f)
main()

I tried everything in the comments but it doesn't work

If you can use `lxml` package, XSLT would make this very simple without a single loop. — Parfait, Sep 25 '20 at 17:02

score 1 · Answer 1 · answered Sep 25 '20 at 17:03

Try something like this:

elems = """<?xml version="1.0" ?>
<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
    <tag id="29">
      <name>Moon</name>
    </tag>
   </tags>
</root>
""" #note that the xml has been fixed

from lxml import etree
doc = etree.XML(elems)
to_del = doc.xpath('//name["Moon"]/parent::tag[@id="29"]')
for td in to_del:
    td.getparent().remove(td)    
print(etree.tostring(doc, pretty_print=True, xml_declaration=True).decode())

Output:

<?xml version='1.0' encoding='ASCII'?>
<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
    </tags>
</root>

is there another way to replace "elem" because the content of the folder is not always the same. (sorry if my question is stupid but this is a new project and I didn't know before ElementTree). — Souheil, Sep 28 '20 at 14:01
@Souheil, but your example uses `ElementTree` (i.e., `xml.etree.ElementTree`). Maybe you meant `lxml`? *See* [What are the differences between lxml and ElementTree?](https://stackoverflow.com/q/47229309/1422451) — Parfait, Sep 28 '20 at 14:38

score 1 · Accepted Answer · answered Sep 25 '20 at 17:18

To remove an element in ElementTree (which is what the question is tagged with, but no import is shown) you must first get the parent element (in this case tags). (lxml has the .getparent() method shown in Jack Fleeting's answer.)

Also, you shouldn't have to open the file and truncate it if you really want to overwrite it; just use the .write() method of the ElementTree object.

Example...

XML Input (lib.xml; "</tags>" added to make it well-formed)

<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
    <tag id="29">
      <name>Moon</name>
    </tag>
  </tags>
</root>

Python

import xml.etree.ElementTree as etree


def remove_tag(root, tag_id_r):
    tags_elem = root.find("tags")
    target_tag = tags_elem.find(f"tag[@id='{tag_id_r}']")
    if target_tag:
        tags_elem.remove(target_tag)
    else:
        print(f"A tag with the id \"{tag_id_r}\" cannot be found.")


def main():
    tree = etree.parse("lib.xml")
    root = tree.getroot()

    remove_tag(root, input("What is the id of the tag you want to remove? "))

    # Overwriting the input file. Are you sure that's a good idea?
    tree.write("lib.xml", encoding="utf-8")


main()

XML Output (updated lib.xml)

<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
    </tags>
</root>

I have several functions, one to add tags, one to add folders... and sometimes the xml file doesn't exist so I have to create it and try it: ```python with open("lib.xml", 'w') as f: ``` but it doesn't work for me and I don't know why. — Souheil, Sep 28 '20 at 10:12
@Souheil - Even if the xml file doesn't exist, you shouldn't have to create it like that. If it doesn't exist, create a new tree instead of parsing the existing file. The serialization of the tree (writing to a file) doesn't change. — Daniel Haley, Sep 28 '20 at 14:53

score 0 · Answer 3 · answered Sep 26 '20 at 03:31

Another method.

from simplified_scrapy import SimplifiedDoc
html = """
<?xml version="1.0" ?>
<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
    <tag id="29">
      <name>Moon</name>
    </tag>
</root>
"""
doc = SimplifiedDoc(html)
tag29 = doc.select('tag#29')
# Or
# tag29 = doc.getElementByText('Moon',tag='tag')
tag29.remove()
print (doc.html)

Result:

<?xml version="1.0" ?>
<root>
  <tag_folders>
    <folder id="1">Stars</folder>
    <folder id="2">Planet</folder>
    <folder id="3">Satellite</folder>
  </tag_folders>
  <tags>
    <tag>
      <name>Earth</name>
    </tag>
    <tag id="2">
      <name>Sun</name>
    </tag>
</root>

How To remove an Subement in xml file using python

3 Answers3