13

The following does the job of removing unwanted elements and attributes by name ("removeMe" in this example) from an XML file:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node() | @*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node() | @*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="removeMe"/>
</xsl:stylesheet>

The problems are it does not distinguish between elements and attributes, the name is hard-coded, and it can only take one name. How could this be rewritten to use a couple input parameters like below to remove one or more specific elements and/or attributes?

<xsl:param name="removeElementsNamed"/>
<xsl:param name="removeAttributesNamed"/>

The desired result is the ability to remove one or more elements and/or one or more attributes while still distinguishing between elements and attributes (in other words, it should be possible to remove all "time" elements without also removing all "time" attributes).

While I required XSLT 1.0 this round, XSLT 2.0 solutions in accepted and other answers may be useful to others.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
Witman
  • 1,488
  • 2
  • 15
  • 19
  • Are you able to use XSLT 2.0? – Daniel Haley Feb 21 '12 at 22:19
  • @DevNull - Good question. I just asked it [here](http://stackoverflow.com/questions/9387396/upgrading-xslt-1-0-to-xslt-2-0). – Witman Feb 22 '12 at 00:22
  • Thanks to all the good input on answers, question has been expanded to clarify desired function, adding attribute removal functionality as a distinct feature (not to be lumped together with element removal, yet available in same code). – Witman Mar 02 '12 at 01:29

4 Answers4

24

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="removeElementsNamed" select="'x'"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*">
  <xsl:if test="not(name() = $removeElementsNamed)">
   <xsl:call-template name="identity"/>
  </xsl:if>
 </xsl:template>
</xsl:stylesheet>

when applied on any XML document, say this:

<t>
    <a>
        <b/>
        <x/>
    </a>
    <c/>
    <x/>
    <d/>
</t>

produces the wanted correct result -- a copy of the source XML document in which any occurence of element having the name that is the value of the $removeElementsNamed parameter, is deleted:

<t>
   <a>
      <b/>
   </a>
   <c/>
   <d/>
</t>

Do note: In XSLT 1.0 it is syntactically illegal to have a variable or parameter reference inside a template match pattern. This is why the solutions by @Jan Thomä and @treeMonkey both raise an error with any XSLT 1.0 - compliant processor.

Update: Here is a more complicated solution, that allows a pipe-separated list of element names - to be deleted, to be passed to the transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="removeElementsNamed" select="'|x|c|'"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*">
  <xsl:if test=
   "not(contains($removeElementsNamed,
                 concat('|',name(),'|' )
                 )
        )
   ">
   <xsl:call-template name="identity"/>
  </xsl:if>
 </xsl:template>
</xsl:stylesheet>

When applied to the same XML document (above), the transformation produces again the wanted, correct output -- the source XML document with all elements whose name are specified in the $removeElementsNamed parameter -- deleted:

<t>
   <a>
      <b/>
   </a>
   <d/>
</t>

Update2: The same transformation as in Update1, but written in XSLT 2.0:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="removeElementsNamed" select="'|x|c|'"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[name() = tokenize($removeElementsNamed, '\|')]"/>
</xsl:stylesheet>

Update: The OP has added the requirement to also be able to delete all attributes that have some specific name.

Here is the slightly modified transformation to accomodate this new requirement:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:param name="removeElementsNamed" select="'x'"/>
     <xsl:param name="removeAttributesNamed" select="'n'"/>

     <xsl:template match="node()|@*" name="identity">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*">
      <xsl:if test="not(name() = $removeElementsNamed)">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>

     <xsl:template match="@*">
      <xsl:if test="not(name() = $removeAttributesNamed)">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the XML document below (the one used before but with a few attributes added):

<t>
    <a>
        <b m="1" n="2"/>
        <x/>
    </a>
    <c/>
    <x/>
    <d n="3"/>
</t>

the wanted, correct result is produced (all elements named x and all attributes named n are deleted):

<t>
   <a>
      <b m="1"/>
   </a>
   <c/>
   <d/>
</t>

UPDATE2: As again requested by the OP, we now implement the capability to pass pipe-separated list of names for the deletion of elements with these names and respectively a pipe-separated list of names for the deletion of attributes with these names:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:param name="removeElementsNamed" select="'|c|x|'"/>
     <xsl:param name="removeAttributesNamed" select="'|n|p|'"/>

     <xsl:template match="node()|@*" name="identity">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*">
      <xsl:if test=
      "not(contains($removeElementsNamed,
                    concat('|', name(), '|')
                    )
           )
      ">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>

     <xsl:template match="@*">
      <xsl:if test=
      "not(contains($removeAttributesNamed,
                    concat('|', name(), '|')
                    )
           )
       ">
       <xsl:call-template name="identity"/>
      </xsl:if>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document:

<t>
    <a p="0">
        <b m="1" n="2"/>
        <x/>
    </a>
    <c/>
    <x/>
    <d n="3"/>
</t>

the wanted, correct result is produced (elements with names c and x and attributes with names n and p are deleted):

<t>
   <a>
      <b m="1"/>
   </a>
   <d/>
</t>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • How would you handle multiple element names passed in the param? (The OP implies multiple names based on the plural usage of "elements" in `$removeElementsNamed`) – Daniel Haley Feb 21 '12 at 22:26
  • oi! mines not in the match patern! should work, however i cannot test at home dont have any dev software atm :_( – Treemonkey Feb 21 '12 at 22:28
  • @DevNull: He wants to say: Remove all elements named XXX. This is why he uses plural. Of course, if the OP clarifies that he needs to delete elements that have a name from a list of names, I would be glad to give him the appropriate solution. :) – Dimitre Novatchev Feb 21 '12 at 22:32
  • @DevNull: I updated this answer with solution to the multiple-names problem. Thanks for asking this. – Dimitre Novatchev Feb 21 '12 at 23:05
  • @DevNull: A second update, giving the corresponding XSLT 2.0 solution -- while I haven't looked at yours for a second, both seem almost the same... I solemnly declare that I haven't copied and pasted your code :). It is a "shortcoming" of XSLT 2.0 that it is much easier than in 1.0 for two developers to come with the same solution at the same time. – Dimitre Novatchev Feb 21 '12 at 23:13
  • Thanks Dimitre! Great comprehensive answer +1. I can't use XSLT 2.0 this time, but plan to move the direction in the future. I did initially mean remove multiple instances of a single named element, as you understood, but the goal was a flexible Javascript launched transform and the option to remove multiple named elements at once is even better--so I've used update 1. Work beautifully! – Witman Mar 01 '12 at 21:19
  • @Witman: You are welcome. Is your update a new question? I don't see any edits to have been done to your question. – Dimitre Novatchev Mar 01 '12 at 21:28
  • @DimitreNovatchev: Sure, I'll edit the question for the benefit of future readers. Adding second parameter $removeAttributesNamed and copying second template with match="@*" (instead of match="*"), as well as referencing new parameter in this added template works as expected. Thanks again! – Witman Mar 01 '12 at 22:17
  • @DimitreNovatchev: Would you like to update your answer with the $removeAttributesNamed feature? – Witman Mar 01 '12 at 23:35
  • +1 should now be immediate for OP to extend your answer to other more complex cases – Emiliano Poggi Mar 02 '12 at 01:05
  • @Witman: I have updated my answer with a solution that implements the new requirement -- see the update at the very end of the answer. – Dimitre Novatchev Mar 02 '12 at 01:37
  • @Dimitre: Oops, I missed that this did not retain the pipe separate list. Maybe my question wasn't clear. When you have a chance, please edit your last update to include ability to remove multiple elements at a time and/or multiple attributes at a time. – Witman Mar 02 '12 at 01:57
  • @DimitreNovatchev: Beautiful. Thank you, and sorry for the confusion. I hope I've made the question clear now! – Witman Mar 02 '12 at 02:41
3

Here's an XSLT 2.0 option if you can use 2.0. The element names can be passed as comma, tab, pipe, or space separated.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">  
  <xsl:output omit-xml-declaration="yes" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="removeElementsNamed" select="'bar,baz'"/>  

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[name()=tokenize($removeElementsNamed,'[\|, \t]')]"/>  

</xsl:stylesheet>
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:output omit-xml-declaration="yes" indent="yes"/>
   <xsl:param name="removeMe"/>

   <xsl:template match="node() | @*">
      <xsl:if test="not(name(.)=$removeMe)">
        <xsl:copy>
           <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
      </xsl:if>
   </xsl:template>   


</xsl:stylesheet>
Treemonkey
  • 2,133
  • 11
  • 24
  • 1
    ?? `$removeMe` isn't defined anywhere. – Daniel Haley Feb 21 '12 at 22:24
  • 1
    Wouldn't this only filter a single element name? – Jan Thomä Feb 21 '12 at 22:25
  • @JanThomä no because its recursive template – Treemonkey Feb 21 '12 at 22:26
  • 1
    Ahh...you originally had `removeMe` named `removeElementsNamed`. – Daniel Haley Feb 21 '12 at 22:27
  • This code does a few unwanted things: 1. Removes an element named `RemoveMe`. 2. Uses an undefined variable. 3. Even in case the variable were defined this code would remove not only elements but attributes, too. – Dimitre Novatchev Feb 21 '12 at 22:29
  • 2
    I think what @JanThomä is saying is that it only filters a single element name passed as the parameter. (The OP implies multiple names based on the plural usage of "elements" in $removeElementsNamed) – Daniel Haley Feb 21 '12 at 22:29
  • Indeed i was referring to that. My solution isn't particulary clean either though ... – Jan Thomä Feb 21 '12 at 22:31
  • Treemonkey, you have corrected some of the issues, but not all. – Dimitre Novatchev Feb 21 '12 at 22:33
  • @DimitreNovatchev can you explain to me how your answer is different, from what i see its the same but in 2 templates! – Treemonkey Feb 21 '12 at 22:38
  • @Treemonkey: Oh, but it is obvious and I already mentioned it in my first comment 18 minutes ago -- issue 3. still stands unresolved. Your code in its present form deletes not only elements, but also attributes. – Dimitre Novatchev Feb 21 '12 at 22:50
  • @Treemonkey: present version of your answer does not distinguish between attributes and elements, meaning it cannot be used to remove all "time" elements without simultaneously removing all "time" attributes. My original code in the question has the same problem. Will fix. – Witman Mar 02 '12 at 02:25
-1

This is somehwat hacky, but it might give you the general idea:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:param name="removeElementsNamed"/>

<xsl:template match="node() | @*">
    <xsl:copy>
        <xsl:apply-templates select="node() | @*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*[contains($removeElementsNamed, concat(',',name(),','))]"/>

You need to specify the element names to remove as a comma separated list, starting with a comma and ending with a comma, e.g. the value ",foo,bar,baz," will remove all elements named foo bar or baz. If you don't have any elements that are partial names of other elements you can simplify this to:

<xsl:template match="*[contains($removeElementsNamed,name())]"/>

However if you have an XML like

<foo>
  <bar>..<bar>
  <barbara>..</barbara>
<foo>

and use "bar" as parameter, this will delete both the bar and barbara tags, so the first approach is safer.

Jan Thomä
  • 13,296
  • 6
  • 55
  • 83
  • As Dimitre noted, this throws an error. Processing it with msxml3.dll stops with error: "Variables may not be used within this expression." and refers to match="*[contains($removeElementsnamed... – Witman Mar 02 '12 at 02:14
  • I see, this solution doesn't work with XSLT 1.0, then again this was not specified as a requirement :) – Jan Thomä Mar 05 '12 at 13:10
  • Thanks for your answer. I'm sorry the question wasn't more clear to begin with; I didn't know anything about XSLT 2.0 until DevNull brought it up... learning every day, and hoping to use XSLT 2.0 in the future. – Witman Mar 06 '12 at 14:25