2

This issue is similar to the one mentioned here how to unescape xml with xslt but slightly different as my text is coming from a processing instruction.

I have an instruction like this:

<?xm-mark data="&lt;p>Here is the text&lt;/p>" ?>

And I would like to output the data part with the &lt; unencoded. My attempt so far is:

<xsl:template match="processing-instruction('xm-mark')">
  <mymark>
  <xsl:value-of select="substring-before(substring-after(., 'data=&quot;'), '&quot;')"
  disable-output-escaping="yes" />
  </mymark>
</xsl:template>

However, this gives me back the text as &lt;p>. If I remove the disable-output-escaping="yes", i get back &amplt; (double encoded as i would expect). Since I can't put a value-of around the value-of in my template, any idea how i unescape the data?

Community
  • 1
  • 1
Wavel
  • 956
  • 8
  • 31
  • I think there is a misunderstanding here: `xsl:value-of` output **text nodes** and text nodes have special characters `&` and `<` encode into **character entities** in wellformed XML documents. If you want some string to be output as posible not welformed pseudo XML "fragment", then use DOE mechanism. If some smart designer choose to re-encode this non parsed pseudo XML string, traslating every `&` into `&`, then you need to pre-decode this, of course. This was already answered time ago... I will search... –  Jan 24 '11 at 23:46
  • 1
    This is a tragic case of destroying markup. See my answer for an analysis and a recommended solution. – Dimitre Novatchev Jan 25 '11 at 04:33
  • Just to be clear, I did not create the xml with all the processing instructions and encoded characters, and we have no leverage to get the supplier to change. It would be easy for me to create an extension in .net to do the unencoding but was hoping for an "all xslt" solution. – Wavel Jan 25 '11 at 18:07

2 Answers2

3

This is what you get when you destroy markup by converting it to text.

Remember never to "architect" such horrible things.

Also, resorting to DOE is a sign of desperation and is not guaranteed to work at all (DOE is not a mandatory feature and some major XSLT 1.0 processors, such as the one used by FF don't implement it).

So, what other alternative is there?

One possible solution is to write an extension function (there is no such standard function in XSLT/XPath version 1.0 and 2.0) that takes a string, parses it as XML and returns the resulting XML document. It will be used like that:

  <xsl:copy-of select=
      "xx:parse(substring-before(substring-after(., 'data=&quot;'), '&quot;'))/*"/>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Since I process the xml files using .net, I will probably create an extension in C# that unencodes the string. I was trying for a more pure xslt solution. And, just to be clear, I didn't create the XML! You should see the other messes that are in it :) – Wavel Jan 25 '11 at 18:04
  • @Wavel: Yes, and this is the solution I recommend. – Dimitre Novatchev Jan 25 '11 at 18:11
1

Processing instructions don't require anything to be escaped, they're parsed similar to comments, in that anything between the <? and ?> is treated exactly as-is. If you can, you need to amend whatever's generating that instruction to generate this instead:

<?xm-mark data="<p>Here is the text</p>" ?>

If you can't do that, I wouldn't even attempt to use XSLT to parse it.

EDIT: I should probably clarify, as you're likely making things more complicated than you need to here: A processing instruction doesn't have attributes, and even the " and the space at the end are part of the 'value' of the processing instruction node. You've actually got a processing instruction with the name xm-mark and the value data="<p>Here is the text</p>" here (including a space at the end, which doesn't display here); data is as much a part of the value as the <p>..</p> part.

In your case <?xm-mark <p>Here is the text</p>?> is probably enough, then the value of the processing-instruction node is just <p>Here is the text</p>, which is all you're likely interested in.

EDIT: Ouch.. well, you could try this:

  <xsl:template match="processing-instruction('xm-mark')">
    <xsl:element name="mymark">
      <xsl:call-template name="unescape">
        <xsl:with-param name="input" select="substring-before(substring-after(., 'data=&quot;'), '&quot;')" />
      </xsl:call-template>
    </xsl:element>
  </xsl:template>

  <xsl:template name="unescape">
    <xsl:param name="input" />
    <xsl:choose>
      <xsl:when test="contains($input, '&amp;lt;')">
        <xsl:call-template name="unescape">
          <xsl:with-param name="input" select="substring-before($input, '&amp;lt;')" />
        </xsl:call-template>
        <xsl:text>&lt;</xsl:text>
        <xsl:call-template name="unescape">
          <xsl:with-param name="input" select="substring-after($input, '&amp;lt;')" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$input" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

NB: Because the & is taken as text rather than markup, when processing with an xslt you need to use &amp; to refer to it. Hence, your processing-instruction's value is actually represented as &amp;lt;p&gt;etc.. if it were output 'as-is' in an xml document. The xsl above will at least convert that to &lt;p&gt;etc.. but if you wanted actual p tags, use an extension method.

Flynn1179
  • 11,925
  • 6
  • 38
  • 74
  • I wish i could amend whatever was generating the instruction! ahhh, for a perfect world :) Unfortunately, I get what I get and have to work with it that way. – Wavel Jan 25 '11 at 18:02
  • +1 for pointing out that PI's value doesn't require to escape special character. –  Jan 25 '11 at 22:41
  • @Wavel: It's a bad solution, but you could replace `'&lt;'` string for `<` character (and so on with the rest of the special characters) before DOE. –  Jan 25 '11 at 22:44