1

I'm a bit new to Python and to the XML world. I desperately need your help, I'm running out of time to finish this project! Basically I have a xml file that I need to elaborate before importing it into Excel. My XML is structured as follows (very small extract):

<?xml version="1.0" encoding="UTF-8"?>
<Application>
    <first/>
    <second>
        <third/>
        <third/>
        <third/>
    </second>
</Application>

What I need to do is to parse the xml file (elementtree or lxml) and to eliminate <first/> and <second/>, in order to get something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Application>
        <third/>
        <third/>
        <third/>      
</Application>

I have already read and tried basically all the related questions I could find, but all I managed to achieve was to eliminate the whole <first/> element.

I'm using Python 3.6.2, standard libraries are preferred (lxml, elementtree).

Thanks in advance for any help you can give!

Luke
  • 53
  • 6
  • Welcome to SO. Please have a look at [tour]. You may also want to check [What topics can I ask about](http://stackoverflow.com/help/on-topic), and [ask], and how to create a [mcve]. Post the code you have tried and the errors you have received. Be as specific as possible as it will lead to better answers. Show us the code you are using in addition to the xml you need – happymacarts Oct 26 '17 at 15:56
  • https://stackoverflow.com/questions/23498394/remove-a-node-from-etree-but-leaving-child – Abdul Niyas P M Oct 26 '17 at 16:29
  • Thanks @ABDUL NIYAS P M, but I've alraedy tried that. The problem I have is that I need to parse the xml file, I cannot copy it inside the python script manually. What would you suggest? In other words, how can I combine "with open ... as ..." with the code showed in the solution you linked? – Luke Oct 26 '17 at 16:38
  • @Luke You can read xml file like this. "import xml.etree.ElementTree as ET tree = ET.parse('your_xml_file.xml')" – Abdul Niyas P M Oct 26 '17 at 16:43
  • @Luke You can parse the xml from string as well as from file. [more info](https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml) – Abdul Niyas P M Oct 26 '17 at 16:44

1 Answers1

1

Ultimate task is to delete the parent in the given example.(Application - root, first,seond - node, third-inner_nodes) )

1) load your xml(and find the node you consider here as 'Application')

2) get the list of inner_nodes(tree->nodes->inner_nodes) for your tree

3) get all the inner_nodes(nodes with name 'third' here)

4) remove the immediate children of root - 'Applicaiton'

5) Append all the inner_nodes to your root!

yourxmlfile.txt

<?xml version="1.0" encoding="UTF-8"?>\n<Application>\n    <first/>\n    <second>\n        <third/>\n        <third/>\n        <third/>\n    </second>\n</Application>

And you can read your xml file withe tree.parse()

>>> import xml.etree.ElementTree as etree
>>> root=etree.parse('yourxmlfile.xml')
>>> etree.tostring(root)
b'<Application>\n    <first />\n    <second>\n        <third />\n        <third />\n        <third />\n    </second>\n</Application>'
>>> inner_nodes=[node.getchildren() for node in root.getchildren()]
>>> print(inner_nodes)
[[], [<Element 'third' at 0x10c272818>, <Element 'third' at 0x10c2727c8>, <Element 'third' at 0x10c272778>]]
>>> for node in root.getchildren():root.remove(node)
... 
>>> etree.tostring(root)
b'<Application>\n    </Application>'
>>> [[root.append(c) for c in child] for child in filter(None,inner_nodes)]
[[None, None, None]]
>>> etree.tostring(root)
b'<Application>\n    <third />\n        <third />\n        <third />\n    </Application>'
Keerthana Prabhakaran
  • 3,766
  • 1
  • 13
  • 23
  • Thanks for your input but it doesn't work. Do you know how to do that without making it a string? – Luke Oct 26 '17 at 17:03
  • I'm not making it a string anywhere else other than printing. I've used etree.parse() ! Can you share the traceback of the error that you get? – Keerthana Prabhakaran Oct 26 '17 at 18:15