3

I am working with an XML file that has raw HTML stored inside a node's attribute (<node data="HTML...">).

I just realized that the HTML is double-encoded, so that, instead of being:

&lt;div&gt;

It is actually written as:

&amp;lt;div&amp;gt;

This means that if I do something like:

<xsl:value-of select="node/@data" disable-output-escaping="yes" />

I will still get a (single) escaped value:

&lt;div&gt;

What's the easiest way of unescaping this once again?

Amberite
  • 1,379
  • 3
  • 14
  • 27

1 Answers1

2

It's definitely not pretty, but basically you are looking at a limited number of string replace operations

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" encoding="utf-8" />

  <xsl:variable name="ampDbl" select="'&amp;amp;'" />
  <xsl:variable name="amp" select="'&amp;'" />
  <xsl:variable name="ltDbl" select="'&amp;lt;'" />
  <xsl:variable name="lt" select="'&lt;'" />
  <xsl:variable name="gtDbl" select="'&amp;gt;'" />
  <xsl:variable name="gt" select="'&gt;'" />

  <xsl:template match="/">
    <xsl:apply-templates select="//@data" mode="unescape" />
  </xsl:template>

  <xsl:template match="@data" mode="unescape">
    <xsl:variable name="step1">
      <xsl:call-template name="StringReplace">
        <xsl:with-param name="s" select="string()" />
        <xsl:with-param name="search" select="$ltDbl" />
        <xsl:with-param name="replace" select="$lt" />
      </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="step2">
      <xsl:call-template name="StringReplace">
        <xsl:with-param name="s" select="$step1" />
        <xsl:with-param name="search" select="$gtDbl" />
        <xsl:with-param name="replace" select="$gt" />
      </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="step3">
      <xsl:call-template name="StringReplace">
        <xsl:with-param name="s" select="$step2" />
        <xsl:with-param name="search" select="$ampDbl" />
        <xsl:with-param name="replace" select="$amp" />
      </xsl:call-template>
    </xsl:variable>
    <xsl:value-of select="$step3" disable-output-escaping="yes" />
  </xsl:template>

  <!-- generic string replace template -->
  <xsl:template name="StringReplace">
    <xsl:param name="s"       select="''" />
    <xsl:param name="search"  select="''" />
    <xsl:param name="replace" select="''" />

    <xsl:choose>
      <xsl:when test="contains($s, $search)">
        <xsl:value-of select="substring-before($s, $search)" />
        <xsl:value-of select="$replace" />
        <xsl:variable name="rest" select="substring-after($s, $search)" />
        <xsl:if test="$rest">
          <xsl:call-template name="StringReplace">
            <xsl:with-param name="s"       select="$rest" />
            <xsl:with-param name="search"  select="$search" />
            <xsl:with-param name="replace" select="$replace" />
          </xsl:call-template>
        </xsl:if>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$s" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

When applied to

<root>
  <node data="&amp;lt;div&amp;gt;bla &amp;amp;amp; bla&amp;lt;/div&amp;gt;" />
</root>

gives (in source code)

<div>bla &amp; bla</div>

which of course becomes this on screen:

bla & bla

You might want to add a step4 for '&amp;quot;' to '&quot;'.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • FWIW, I did the opposite operation (double-escaping output) a long time ago in [another answer](http://stackoverflow.com/a/2652733/18771) – Tomalak Nov 01 '13 at 21:53
  • Wow, you're right, it's not pretty, but it works :) Until someone can come up with a better way, going to mark this as the answer. Thanks! – Amberite Nov 02 '13 at 20:08
  • 1
    @Amberite If your XSLT processor supports extension functions (like [EXSLT](http://www.exslt.org/) or one of the proprietary extension methods, like the one .NET provides or the MSXSL script extensions) then you might be able to off-load the heavy lifting to an external routine that is better-suited for string processing and HMTL parsing. The above is the vanilla XSLT 1.0 method. If you have more than vanilla XSLT 1.0., by all means make use of it. – Tomalak Nov 02 '13 at 20:25