I have an XML file with a structure like the following:
<?xml version = '1.0' encoding="ISO-8859-1"?>
<!DOCTYPE stuff PUBLIC "stuff" "stuff.dtd">
<stuff>
<level1>
<type>foo</type>
<name>name1_A</name>
<junk1>garbage</junk1>
<junk2>garbage</junk2>
<level2>
<name>name2_A</name>
<junk3>garbage</junk3>
<junk4>garbage</junk4>
<level3>
<name>name3_A</name>
<junk5>garbage</junk5>
<junk6>garbage</junk6>
</level3>
<level3>
<name>name3_B</name>
<junk5>garbage</junk5>
<junk6>garbage</junk6>
</level3>
</level2>
<level2>
<name>name2_B</name>
<junk>garbage</junk>
<level3>
<name>name3_A</name>
<junk>garbage</junk>
</level3>
<level3>
<name>name3_B</name>
<junk>garbage</junk>
</level3>
</level2>
</level1>
<level1>
<type>foo</type>
<name>name1_B</name>
<junk1>garbage</junk1>
<junk2>garbage</junk2>
<level2>
<name>name2_A</name>
<junk3>garbage</junk3>
<junk4>garbage</junk4>
<level3>
<name>name3_A</name>
<junk5>garbage</junk5>
<junk6>garbage</junk6>
</level3>
<level3>
<name>name3_B</name>
<junk5>garbage</junk5>
<junk6>garbage</junk6>
</level3>
</level2>
<level2>
<name>name2_B</name>
<junk>garbage</junk>
<level3>
<name>name3_A</name>
<junk>garbage</junk>
</level3>
<level3>
<name>name3_B</name>
<junk>garbage</junk>
</level3>
</level2>
</level1>
</stuff>
I'd like to write an XSLT to filter out all the elements named junk*. That is, I know the element names that I want to keep and want to get rid of everything else. The desired end result with the above starting point would look like this with all the junk elements stripped out:
<?xml version = '1.0' encoding="ISO-8859-1"?>
<!DOCTYPE stuff PUBLIC "stuff" "stuff.dtd">
<stuff>
<level1>
<type>foo</type>
<name>name1_A</name>
<level2>
<name>name2_A</name>
<level3>
<name>name3_A</name>
</level3>
<level3>
<name>name3_B</name>
</level3>
</level2>
<level2>
<name>name2_B</name>
<level3>
<name>name3_A</name>
</level3>
<level3>
<name>name3_B</name>
</level3>
</level2>
</level1>
<level1>
<type>foo</type>
<name>name1_B</name>
<level2>
<name>name2_A</name>
<level3>
<name>name3_A</name>
</level3>
<level3>
<name>name3_B</name>
</level3>
</level2>
<level2>
<name>name2_B</name>
<level3>
<name>name3_A</name>
</level3>
<level3>
<name>name3_B</name>
</level3>
</level2>
</level1>
</stuff>
Keep in mind the various junk elements I have in my sample could be named anything - I have the list of element names I want to keep (e.g. level1/type, level1/name, level1/level2/name, level1/level2/level3/name, etc.) and want to drop everything else.
The best I've got so far is this XSLT, but here I have to explicitly list all the element names I want to remove, not the ones I want to keep, so it's less than ideal:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="junk1 | junk2 | junk3 | junk4 | junk5 | junk6"/>
</xsl:stylesheet>