0

I am trying to use XSLT to remove ancestor tags (and their children) when they have empty text and a specific attribute value. I have XSLT that checks the text() of each node and when it is empty and the ancestor has attribute deltaxml:deltaV2="A" I want to remove the ancestor and children nodes.

Here is the xml tags I want to remove (note: the ancestor can be anything not just 'p'). In this case I want the last p tag and children removed:

<body>
  <p deltaxml:deltaV2="A=B">
    <t>This is the same</t>
  </p>
  <p deltaxml:deltaV2="B">
    <t>This is inserted</t>
  </p>
  <p deltaxml:deltaV2="A">
    <t>This is deleted</t>
  </p>
  <p deltaxml:deltaV2="A">
    <t> </t>
  </p>
</body>

And here is the XSLT I have so far:

<xsl:template match="@* | * | processing-instruction() | comment()" mode="#all">
  <xsl:copy copy-namespaces="no">
    <xsl:apply-templates select="@*, node()" mode="#current"/>
  </xsl:copy>
</xsl:template> 
<xsl:template match="text()">
   <xsl:variable name="deltaV2" as="attribute()" select="ancestor::*[@deltaxml:deltaV2][1]/@deltaxml:deltaV2"/>
   <xsl:variable name="text" select="."/>
<xsl:choose>
  <xsl:when test="$deltaV2 eq 'A'">
    <xsl:choose>
      <xsl:when test="$text = ' '">

        <!-- need to remove ancestor tags-->

      </xsl:when>
      <xsl:otherwise>
        <xsl:element name="delete" namespace="{$root-ns}">
          <xsl:value-of select="."/>
        </xsl:element>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:when>
  <xsl:when test="$deltaV2 eq 'B'">
    <xsl:element name="insert" namespace="{$root-ns}">
      <xsl:value-of select="."/>
    </xsl:element>
  </xsl:when>
  <xsl:otherwise>
    <xsl:value-of select="."/>
  </xsl:otherwise>
</xsl:choose>

Here is desired output:

<body>
  <p deltaxml:deltaV2="A=B">
    <t>This is the same</t>
  </p>
  <p deltaxml:deltaV2="B">
    <t><insert>This is inserted</insert></t>
  </p>
  <p deltaxml:deltaV2="A">
    <t><delete>This is deleted</delete></t>
  </p>
</body>

The reason I need this is because those attributes show whether something was inserted or deleted between 2 versions of XML, but if there was an empty node (ie. the empty t tags in the sample) I don't want to track that as a change since no text has changed, and just want that removed. What do I need to put when the text is empty to be able to remove those tags?

Developer Guy
  • 2,318
  • 6
  • 19
  • 37
  • Your input is not well-formed XML and neither is your output: you cannot use a prefix (deltaxml:) without binding to a namespace first. – michael.hor257k Aug 03 '16 at 21:03
  • These were just snippets of my code, the actual files are much larger so I just took out the areas where I was trying to make changes. – Developer Guy Aug 03 '16 at 21:06
  • Reducing the size of the example is fine, even welcome - but you need to make sure the example is complete and well formed, otherwise it's useless for testing - see: [mcve]. – michael.hor257k Aug 03 '16 at 21:08

2 Answers2

1

I am trying to use XSLT to remove [...] tags (and their children) when they have empty text and a specific attribute value.

The template that does that is very simple

<xsl:template match="*[@deltaxml:deltaV2 = 'A' and normalize-space() = '']" />

Use it in along with the identity transform. Read about the identity transform here: http://www.dpawson.co.uk/xsl/sect2/identity.html (among countless other examples that your favorite search engine will provide).

This question here on SO also provides a canonical answer to the same problem you are describing: How to remove elements from xml using xslt with stylesheet and xsltproc?

Community
  • 1
  • 1
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • That didn't seem to remove any of the empty nodes like I need it to. – Developer Guy Aug 03 '16 at 20:17
  • To help you I need a (condensed, but complete) input sample, the complete XSLT are trying it with along with exact output you expect for the input sample. – Tomalak Aug 03 '16 at 20:28
  • You did follow the advice about the identity transform, right? Start with a stylesheet that is very similar to the one in the linked question and take it from there step by step. Anyway, I'm out for today, I'll have another look tomorrow. – Tomalak Aug 03 '16 at 20:36
  • After doing some more digging into the issue, your answer appears to work at least with the sample I provided. However, it seems like these empty spaces in my XML are being treated as text (or at least not empty) for some reason which is why it isn't working for that.

    ��

    this is what is showing up when I ran my code on an online XSLT tester - http://xslttest.appspot.com/. So I believe that for whatever reason, these aren't really empty spaces; do you have any idea what they could be?
    – Developer Guy Aug 04 '16 at 14:54
  • This is difficult to tell from afar. It looks like a character encoding issue. To find out what those "blank-looking" characters really are, you could open your original XML in a hex editor and get their character codes. `normalize-space()` only deals with "traditional" whitespace (character codes 32, 9, 10 and 13). Unicode defines some extra characters that look like whitespace. Maybe you are seeing some of those. – Tomalak Aug 04 '16 at 15:37
  • When I opened this in the hex editor with notepad++ I added a whitespace node manually which had a value of: 20 and looked like this:

    and the nodes that were native to the document had a value of: c2 a0 and looked like this:

    Â.

    . Not sure what all this means.
    – Developer Guy Aug 04 '16 at 15:55
  • 1
    The byte sequence C2 A0 is how the the non-breaking space appears in UTF-8 encoding (character code 160, or 0xA0 in hex notation). Compare http://www.fileformat.info/info/unicode/char/00a0/index.htm, next to "UTF-8 (hex)". This is one of the characters that will not be stripped by `normalize-space()`. If you want to get rid of it, you can use `translate()` to remove it: `normalize-space(translate(., ' ', ''))`. Note that this snippet of XPath only runs in XSLT, because the ` ` sequence is translated by the XML parser, not by XPath (XPath itself has no string escaping). – Tomalak Aug 04 '16 at 18:01
  • 1
    @JustinR And just to complete the picture: `Â.` is what you get when you interpret the bytes `C2 A0` not as UTF-8, but as Windows-1252. UTF-8 knows that those two bytes belong together and refer to a single character. Windows-1252 doesn't and interprets them individually. – Tomalak Aug 04 '16 at 18:33
  • Thanks for the clarification, this seemed to fix the problems I was having. Quite the workaround for something I initially thought would be so simple. – Developer Guy Aug 04 '16 at 18:59
  • Oh, but the final solution is simple, is it not? At least a great deal simpler than what you have been attempting. You just did not have all the puzzle pieces. (If it's not radically simpler than the attempt in your question, post it and I'll have a look.) – Tomalak Aug 04 '16 at 19:12
  • 1
    Oh no you are definitely correct, much much simpler (a one liner in fact). Once you told me about what was causing the problems it helped a lot. – Developer Guy Aug 04 '16 at 19:22
0

You cannot remove an ancestor in a template that processes the descendant. By the time your stylesheet gets to the empty text() node, the p ancestor has already been processed and written to the output tree - see: https://www.w3.org/TR/xslt/#section-Processing-Model

If you restate your requirement as: remove any element (and all its descendants) that satisfies both:

  1. has an attribute named deltaxml:deltaV2 with a value of "A";
  2. has at least one descendant element with no non-whitespace text node,

then you can implement this as:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:deltaxml="http://example.com/deltaxml">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*[@deltaxml:deltaV2='A'][.//*[not(normalize-space())]]" />

</xsl:stylesheet>

Of course, you need to bind the deltaxml: prefix to the same namespace URI used in your input XML, not an arbitrary one as I did here.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51