How to get the complete xml with just the required element

Question

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
for i in tree.findall('.//rank'):
    print ET.tostring(i)

Here I want to get all the rank elements ( with maintaining its absolute structure )

I am getting the output as

<rank>1</rank>

<rank>4</rank>

<rank>68</rank>

What should I do to get the output as

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
    </country>
    <country name="Singapore">
        <rank>4</rank>
    </country>
    <country name="Panama">
        <rank>68</rank>
    </country>
</data>

when the input xml file country_data.xml is

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/> First Country
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/> Second Country
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/> Third Country
    </country>
</data>

score 2 · Answer 1 · answered May 19 '14 at 15:54

You can do it using Python + XSLT. First you will need a XSLT document. The one below makes the transformation you require (you can test it here):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output indent="yes"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="year|gdppc|neighbor|country/text()" />
</xsl:stylesheet>

You can use LXML to transform XSLT in Python:

import lxml.etree as etree

source = etree.parse("data.xml")
xsldoc = etree.parse("stylesheet.xsl")
transform = etree.XSLT(xsldoc)
result = transform(source)
print(etree.tostring(result, pretty_print=True))

The result of this transformation is:

<?xml version="1.0" encoding="UTF-8"?>
<data>
    <country name="Liechtenstein">
      <rank>1</rank>
   </country>
    <country name="Singapore">
      <rank>4</rank>
   </country>
    <country name="Panama">
      <rank>68</rank>
   </country>
</data>

score 1 · Answer 2 · edited May 23 '17 at 11:57

A simple solution is probably to delete all elements which aren't data, country and rank and then output the root element.

The alternative would be to create a new document with a data root, then iterate over all rank elements, get their immediate parents, copy them as children to data (with all necessary attributes) and then add a copy of the rank element.

But since elementtree doesn't keep a parent reference, you need some workarounds for that: access ElementTree node parent node

How to get the complete xml with just the required element

2 Answers2