75

I have a lot of XML files which have something of the form:

<Element fruit="apple" animal="cat" />

Which I want to be removed from the file.

Using an XSLT stylesheet and the Linux command-line utility xsltproc, how could I do this?

By this point in the script I already have the list of files containing the element I wish to remove, so the single file can be used as a parameter.


EDIT: the question was originally lacking in intention.

What I am trying to achieve is to remove the entire element "Element" where (fruit=="apple" && animal=="cat"). In the same document there are many elements named "Element", I wish for these to remain. So

<Element fruit="orange" animal="dog" />
<Element fruit="apple"  animal="cat" />
<Element fruit="pear"   animal="wild three eyed mongoose of kentucky" />

Would become:

<Element fruit="orange" animal="dog" />
<Element fruit="pear"   animal="wild three eyed mongoose of kentucky" />
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
Grundlefleck
  • 124,925
  • 25
  • 94
  • 111

2 Answers2

142

Using one of the most fundamental XSLT design patterns: "Overriding the identity transformation" one will just write the following:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output omit-xml-declaration="yes"/>

    <xsl:template match="node()|@*">
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
    </xsl:template>

    <xsl:template match="Element[@fruit='apple' and @animal='cat']"/>
</xsl:stylesheet>

Do note how the second template overrides the identity (1st) template only for elements named "Element" that have an attribute "fruit" with value "apple" and attribute "animal" with value "cat". This template has empty body, which means that the matched element is simply ignored (nothing is produced when it is matched).

When this transformation is applied on the following source XML document:

<doc>... 
    <Element name="same">foo</Element>...
    <Element fruit="apple" animal="cat" />
    <Element fruit="pear" animal="cat" />
    <Element name="same">baz</Element>...
    <Element name="same">foobar</Element>...
</doc>

the wanted result is produced:

<doc>... 
    <Element name="same">foo</Element>...
    <Element fruit="pear" animal="cat"/>
    <Element name="same">baz</Element>...
    <Element name="same">foobar</Element>...
</doc>

More code snippets of using and overriding the identity template can be found here.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 14
    Despite me not even asking the right question, you've answered exactly what I should have asked! :) – Grundlefleck Nov 27 '08 at 09:50
  • 3
    Why don't you mark this post as the correct answer then? Then it would disappear form the list of unanswered problems. – Dirk Vollmar Nov 27 '08 at 16:51
  • 2
    Had to wait till I verified it worked, and didn't get a chance at work today. Done now though, thanks Dimitre. – Grundlefleck Nov 27 '08 at 19:50
  • Could you tell me what is the abbreviated version of this xpath expression `/bookstore/book[position() = 1 or position() = 3]/@*`? – Arup Rakshit Jul 30 '13 at 10:51
  • 2
    @Babai, `/*/book[position() = 1 or position() = 3]/@*` . In XPath 2.0: `/*/book[position() = (1,3)]/@*` – Dimitre Novatchev Jul 30 '13 at 14:16
  • Thanks for your response.how `.//title` differs from `./title` ? When should I use `.//title` and when should I `//title` ? – Arup Rakshit Jul 30 '13 at 14:19
  • Thanks for the note about how the second template overrides the first. Although I can see how that makes sense now, it wasn't anywhere on my radar for how I thought it worked before. I was badly confused why any examples online were working at all. – ArtOfWarfare Sep 23 '16 at 15:08
  • Unfortunately, your "identity transformation" link is broken. This link appears to be what it originally was referencing: https://www.w3.org/TR/2009/PER-xslt20-20090421/#copying – Brad Turek Dec 03 '19 at 07:48
  • @BradTurek Thanks for noticing this. I just updated the document with the new link. The W3C had updated their website and unfortunately changed the URLs to the specifications, thus breaking any document that contains the original URLs – Dimitre Novatchev Dec 03 '19 at 15:12
4

The answer by @Dimitre Novatchev is certainly both correct and elegant, but there's a generalization (that the OP didn't ask about): what if the element you want to filter also has child elements or text that you want to keep?

I believe this minor variation covers that case:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="2.0">

    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- drop DropMe elements, keeping child text and elements -->
    <xsl:template match="DropMe">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:stylesheet>

The match condition can be complicated to specify other attributes, etc., and you can use multiple such templates if you're dropping other things.

So this input:

<?xml version="1.0" encoding="UTF-8"?>
<mydocument>
    <p>Here's text to keep</p>
    <p><DropMe>Keep this text but not the element</DropMe>; and keep what follows.</p>
    <p><DropMe>Also keep this text and <b>this child element</b> too</DropMe>, along with what follows.</p>
</mydocument>

produces this output:

<?xml version="1.0" encoding="UTF-8"?><mydocument>
    <p>Here's text to keep</p>
    <p>Keep this text but not the element; and keep what follows.</p>
    <p>Also keep this text and <b>this child element</b> too, along with what follows.</p>
</mydocument>

Credit to XSLT Cookbook.

Sboisen
  • 59
  • 3