2

My question is How to un-escape xml that has already been escaped.

I tried the code provided by Tomalak in response to How to unescape XML characters with help of XSLT?, but I can't get that to do what I want.

I have SoapMsg Xml. The body contains a few elements one of which is a String. This string contains Escaped XML. This is often done in RPC SoapMsg because they don't allow complex types. To Get around this they embed Escaped-Xml inside a String Element, see sXmlParameters in the input below.

Example Input:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:pan="http://wsdl.somebody.com/Stuff/">
  <soap:Header />
  <soap:Body>
    <pan:SomeCommand>
      <first>eefbb52a0fee443cbda838caffbc2654</first>
      <second>f26eb2f5dabc457ca045e64585f7b185</second>
      <sXmlParameters>&lt;PARAMETERS&gt;&lt;TIMEOUTDATETIME&gt;2011-03-15
        2:09:48.997&lt;/TIMEOUTDATETIME&gt;&lt;/PARAMETERS&gt;</sXmlParameters>
    </pan:SomeCommand>
  </soap:Body>
</soap:Envelope>

I also see this data escaped with <![CDATA[>]]>, I need to un-escape it also.

Converted Output:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:pan="http://wsdl.somebody.com/Stuff/">
  <soap:Header />
  <soap:Body>
    <pan:SomeCommand>
      <first>eefbb52a0fee443cbda838caffbc2654</first>
      <second>f26eb2f5dabc457ca045e64585f7b185</second>
      <sXmlParameters>
        <PARAMETERS>
           <TIMEOUTDATETIME>2011-03-15 2:09:48.997</TIMEOUTDATETIME>
        </PARAMETERS>
      </sXmlParameters>
    </pan:SomeCommand>
  </soap:Body>
</soap:Envelope>
Community
  • 1
  • 1
user668595
  • 21
  • 1
  • 3
  • You can edit your question. Don't post an answer instead. This is a Q/A site, not a forum. – dandan78 Mar 21 '11 at 18:20
  • @user668595: SOAP allows embedded vocabulary. Why this terrible design choice then? –  Mar 23 '11 at 00:36

3 Answers3

1

This will already take care of half of your problem – not the CDATA part:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="//sXmlParameters">
        <xsl:copy>
            <xsl:call-template name="unescape">
                <xsl:with-param name="escaped" select="string(.)"/>
            </xsl:call-template>
        </xsl:copy>
    </xsl:template>

    <xsl:template name="unescape">
        <xsl:param name="escaped"/>
        <xsl:choose>
            <xsl:when test="contains($escaped,'&lt;')">
                <xsl:variable name="beforeelem" select="substring-before($escaped,'&lt;')"/>
                <xsl:variable name="elemname1" select="substring-before(substring-after($escaped,'&lt;'),' ')"/>
                <xsl:variable name="elemname2" select="substring-before(substring-after($escaped,'&lt;'),'&gt;')"/>
                <xsl:variable name="elemname3" select="substring-before(substring-after($escaped,'&lt;'),'/&gt;')"/>
                <xsl:variable name="hasattributes" select="string-length($elemname1) &gt; 0 and ((string-length($elemname2)=0 or string-length($elemname1) &lt; string-length($elemname2)) and (string-length($elemname3)=0 or string-length($elemname1) &lt; string-length($elemname3)))"/>
                <xsl:variable name="elemclosed" select="string-length($elemname3) &gt; 0 and (string-length($elemname2)=0 or string-length($elemname3) &lt; string-length($elemname2))"/>
                <xsl:variable name="elemname">
                    <xsl:choose>
                        <xsl:when test="$hasattributes">
                            <xsl:value-of select="$elemname1"/>
                        </xsl:when>
                        <xsl:when test="not($elemclosed)">
                            <xsl:value-of select="$elemname2"/>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:value-of select="$elemname3"/>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:variable>
                <xsl:variable name="elemclosetag" select="concat('&lt;/',$elemname,'&gt;')"/>
                <xsl:variable name="innercontent">
                    <xsl:if test="not($elemclosed)">
                        <xsl:call-template name="skipper-before">
                            <xsl:with-param name="source" select="substring-after(substring-after($escaped,'&lt;'),'&gt;')"/>
                            <xsl:with-param name="delimiter" select="$elemclosetag"/>
                        </xsl:call-template>
                    </xsl:if>
                </xsl:variable>
                <xsl:variable name="afterelem">
                    <xsl:choose>
                        <xsl:when test="not($elemclosed)">
                            <xsl:call-template name="skipper-after">
                                <xsl:with-param name="source" select="substring-after(substring-after($escaped,'&lt;'),'&gt;')"/>
                                <xsl:with-param name="delimiter" select="$elemclosetag"/>
                            </xsl:call-template>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:value-of select="substring-after(substring-after($escaped,'&lt;'),'/&gt;')"/>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:variable>
                <xsl:element name="{$elemname}">
                    <xsl:if test="$hasattributes">
                        <xsl:call-template name="unescapeattributes">
                            <xsl:with-param name="escapedattributes">
                                <xsl:choose>
                                    <xsl:when test="not($elemclosed)">
                                        <xsl:value-of select="normalize-space(substring-after($elemname2,' '))"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:value-of select="normalize-space(substring-after($elemname3,' '))"/>
                                    </xsl:otherwise>
                                </xsl:choose>
                            </xsl:with-param>
                        </xsl:call-template>
                    </xsl:if>
                    <xsl:call-template name="unescape">
                        <xsl:with-param name="escaped" select="$innercontent"/>
                    </xsl:call-template>
                </xsl:element>
                <xsl:call-template name="unescape">
                    <xsl:with-param name="escaped" select="$afterelem"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:call-template name="unescapetext">
                    <xsl:with-param name="escapedtext" select="$escaped"/>
                </xsl:call-template>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template name="unescapeattributes">
        <xsl:param name="escapedattributes"/>
        <xsl:variable name="attrname" select="substring-before($escapedattributes,'=')"/>
        <xsl:variable name="attrquote" select="substring($escapedattributes,string-length($attrname)+2,1)"/>
        <xsl:variable name="attrvalue" select="substring-before(substring-after($escapedattributes,$attrquote),$attrquote)"/>
        <xsl:variable name="afterattr" select="substring-after(substring-after($escapedattributes,$attrquote),$attrquote)"/>
        <xsl:attribute name="{$attrname}">
            <xsl:call-template name="unescapetext">
                <xsl:with-param name="escapedtext" select="$attrvalue"/>
            </xsl:call-template>
        </xsl:attribute>
        <xsl:if test="contains($afterattr,'=')">
            <xsl:call-template name="unescapeattributes">
                <xsl:with-param name="escapedattributes" select="normalize-space($afterattr)"/>
            </xsl:call-template>
        </xsl:if>
    </xsl:template>

    <xsl:template name="unescapetext">
        <xsl:param name="escapedtext"/>
        <xsl:call-template name="string-replace-all">
            <xsl:with-param name="text">
                <xsl:call-template name="string-replace-all">
                    <xsl:with-param name="text">
                        <xsl:call-template name="string-replace-all">
                            <xsl:with-param name="text" select="$escapedtext"/>
                            <xsl:with-param name="replace">&amp;gt;</xsl:with-param>
                            <xsl:with-param name="by">&gt;</xsl:with-param>
                        </xsl:call-template>
                    </xsl:with-param>
                    <xsl:with-param name="replace">&amp;lt;</xsl:with-param>
                    <xsl:with-param name="by">&lt;</xsl:with-param>
                </xsl:call-template>
            </xsl:with-param>
            <xsl:with-param name="replace">&amp;amp;</xsl:with-param>
            <xsl:with-param name="by">&amp;</xsl:with-param>
        </xsl:call-template>
    </xsl:template>

    <!-- replaces substrings in strings -->
    <xsl:template name="string-replace-all">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="by"/>
        <xsl:choose>
            <xsl:when test="contains($text, $replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$by"/>
                <xsl:call-template name="string-replace-all">
                    <xsl:with-param name="text" select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="by" select="$by"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!-- returns the substring after the last delimiter -->
    <xsl:template name="skipper-after">
        <xsl:param name="source"/>
        <xsl:param name="delimiter"/>
        <xsl:choose>
            <xsl:when test="contains($source,$delimiter)">
                <xsl:call-template name="skipper-after">
                    <xsl:with-param name="source" select="substring-after($source,$delimiter)"/>
                    <xsl:with-param name="delimiter" select="$delimiter"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$source"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!-- returns the substring before the last delimiter -->
    <xsl:template name="skipper-before">
        <xsl:param name="source"/>
        <xsl:param name="delimiter"/>
        <xsl:param name="result"/>
        <xsl:choose>
            <xsl:when test="contains($source,$delimiter)">
                <xsl:call-template name="skipper-before">
                    <xsl:with-param name="source" select="substring-after($source,$delimiter)"/>
                    <xsl:with-param name="delimiter" select="$delimiter"/>
                    <xsl:with-param name="result">
                        <xsl:if test="result!=''">
                            <xsl:value-of select="concat($result,$delimiter)"/>
                        </xsl:if>
                        <xsl:value-of select="substring-before($source,$delimiter)"/>
                    </xsl:with-param>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$result"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>
mousio
  • 10,079
  • 4
  • 34
  • 43
  • The answer should be do parse the unparsed data. Do you think you have a good XML parser here? –  Mar 23 '11 at 00:34
  • @Alejandro: The xslt outputs the requested result (tested for the given input). The xslt is based on some transformations to comment out/uncomment parts of xml documents. Come to think of it, there is more missing than just support for CDATA: apos/quot/processing-instructions/… are not specifically handled. – mousio Mar 23 '11 at 09:33
  • This version doesn't handle `
    `. The space before `/` causes `$hasattributes = true`. Also, it doesn't output `$beforeelem`. I'm also going to extend mine to handle entities... Do you have this template at github or somewhere else I can contribute changes?
    – Emyr Jun 06 '14 at 15:05
  • @Emyr: No, this is not available or posted anywhere else. Should you want to develop this further on github or whatever, I'd like that and you have my consent. But as long as the relevance is strong enough, don't forget to refer here with the proper permalink (and vice versa?) Good luck! :] – mousio Jun 08 '14 at 20:07
0

Wrote a SAX parser for xml-escaped strings in pure xsl 1.0+EXSLT

<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:pxml="https://github.com/ilyakharlamov/pure-xsl/parseStringAsXML"
    version="1.0">
    <xsl:import href="https://raw.githubusercontent.com/ilyakharlamov/pure-xsl/master/parseStringAsXML.xsl"/>
    <xsl:template match="/">
        <xsl:call-template name="pxml:parseStringAsXML">
            <xsl:with-param name="string">&lt;PARAMETERS&gt;&lt;TIMEOUTDATETIME&gt;2011-03-152:09:48.997&lt;/TIMEOUTDATETIME&gt;&lt;/PARAMETERS&gt;</xsl:with-param>
        </xsl:call-template>
    </xsl:template>
</xsl:stylesheet>

Output:

<PARAMETERS>
   <TIMEOUTDATETIME>2011-03-152:09:48.997</TIMEOUTDATETIME>
</PARAMETERS>
Ilya Kharlamov
  • 3,698
  • 1
  • 31
  • 33
0

I found that I can use saxon to do this in a much simpler way using the following:

<xsl:template match="SomeCommand">
    <sXmlParameters>
      <xsl:apply-templates select="saxon:parse(.)" />
    </sXmlParameters>
  </xsl:template>

there is also saxon:seriralize() that can be used to escape the xml

thanks to all for you input

user668595
  • 21
  • 1
  • 3