0

I have a complex XML for which it's hard to see the structure because each node of level #2 have thousands of children. I'd like to truncate the XML like this:

<main>
  <a type="x">
    <b>
    <b>  # should be deleted
    <b>  # should be deleted
    # thousand others
  </a>
  <a type="y">
    <c>
    <c>  # should be deleted
    # many others
  </a>
  <a type="z">
    <d>
    <d>  # should be deleted
    # many others
  </a>
</main>

How to keep only one child for each node of level 2, 3, etc. and export the result?

I tried this, but nothing seems deleted:

import xml.etree.ElementTree as ET
tree = ET.parse('in.xml')
root = tree.getroot()    

for l1 in root:
    print(l1, l1.tag, l1.attrib)
    for i, l2 in enumerate(l1):
        print(i, l2)
        if i > 0:
            l1.remove(l2)    # nothing seems removed, why?

tree.write('out.xml')            
Basj
  • 41,386
  • 99
  • 383
  • 673
  • https://stackoverflow.com/questions/15168259/python-xml-remove-some-elements-and-their-children-but-keep-specific-elements-an – Ajay Feb 05 '21 at 16:12
  • Do you know why my code fails? It seems ok @Ajay. – Basj Feb 05 '21 at 17:12

2 Answers2

1

The problem probably came from the fact I modified a list while iterating. Using list(l1) solves it:

for l1 in root:
    for l2 in list(l1)[1:]:
        l1.remove(l2)
Basj
  • 41,386
  • 99
  • 383
  • 673
1
import xml.etree.ElementTree as ET
tree = ET.parse('names.xml')
root = tree.getroot()    

for l1 in root.iter():
    for child in l1:
        for g_child in child[1:]:
            child.remove(g_child)
    

tree.write('out.xml')  
Ajay
  • 5,267
  • 2
  • 23
  • 30