0

I am completely new to the world of XSLT so please forgive me if I don't use the proper terms.

I need to make an integration with an external system, and this system returns a string wrapped in a CDATA, similar to this:

<Response xmlns="http://tempuri.org/">
    <Result>
        <![CDATA[
  <resultcode>0</resultcode>
  <message>OK</message>
  <data>&lt;unitlist version="1"&gt;&lt;unit version="1" unitid="%" abbreviation="%" name="Porcentaje" /&gt;
  &lt;unit version="1" unitid="1/2 lb." abbreviation="1/2 lb." name="1/2 libra" /&gt;&lt;unit version="1" unitid="1/2 pt." abbreviation="1/2 pt." name="medias pintas" /&gt;&lt;unit version="1" unitid="1/2 pulg." abbreviation="1/2 pulg." name="1/2 pulg." /&gt;&lt;unit version="1" unitid="1/2&amp;quot; cdr." abbreviation="1/2&amp;quot; cdr." name="1/2 pulgada cuadrada" /&gt;&lt;/unitlist&gt;</data>
        ]]>
    </Result>
</Response>

I need to retrieve the data node, and parse every unit into something like this:

<units>
    <unit>
        <id>%</id>
        <name>Porcentaje</name>
        <abbreviation>%</abbreviation>
    </unit>
    <unit>
        <id>1/2 lb.</id>
        <name>1/2 libra</name>
        <abbreviation>1/2 lb.</abbreviation>
    </unit>
</units>

I have been reading about two phases transformations, and trying to wrap the data into a variable using:

<xsl:value-of select="substring-before(substring-after(., '&lt;data&gt;'), '&lt;/data&gt;')" disable-output-escaping="yes" />

It works for escaping the text tags and getting the nodes, but I am not able to use that to iterate with a for-each and create the XML I need.

I need to do this in one single XSLT.

In advanced thanks for your help.

tomaspozo
  • 3
  • 1
  • 1
    Which XSLT processor will you be using? – michael.hor257k Jul 16 '17 at 18:41
  • Hi @michael.hor257k I am not an expert about it, this is the header of my XSLT file, I don't know if that is showing the processor. ` ` I am trying not to add more dependencies to the file because I don't know if I would be able to install them. So I would like to keep it as simple as possible. – tomaspozo Jul 16 '17 at 21:41
  • That has nothing to do with my question. I asked about your XSLT processor, not your XSLT stylesheet. If you don't know, see here how to find out: https://stackoverflow.com/questions/25244370/how-can-i-check-which-xslt-processor-is-being-used-in-solr/25245033#25245033 -- P.S. You need to have a minimal understanding of the subject matter before asking here - otherwise you won't be able to understand the answers given to you. – michael.hor257k Jul 16 '17 at 22:06
  • @michael.hor257k thanks for the clarification, I am new to this stuff trying to learn fast. I have follow the steps on the link provided, vendor: Microsoft, version: 1. – tomaspozo Jul 16 '17 at 22:35

2 Answers2

0

This is the most horrible XML I have seen for a long time.

Once you've parsed this XML into a tree, you will find there is a text node containing:

<resultcode>0</resultcode>
  <message>OK</message>
  <data>&lt;unitlist version="1"&gt;&lt;unit version="1" unitid="%" abbreviation="%" name="Porcentaje" /&gt;
  &lt;unit version="1" unitid="1/2 lb." abbreviation="1/2 lb." name="1/2 libra" /&gt;&lt;unit version="1" unitid="1/2 pt." abbreviation="1/2 pt." name="medias pintas" /&gt;&lt;unit version="1" unitid="1/2 pulg." abbreviation="1/2 pulg." name="1/2 pulg." /&gt;&lt;unit version="1" unitid="1/2&amp;quot; cdr." abbreviation="1/2&amp;quot; cdr." name="1/2 pulgada cuadrada" /&gt;&lt;/unitlist&gt;</data>

which is sort-of-xml, except that it doesn't have a containing wrapper element. So you need to extract this text node as a string, wrap it in a dummy element, and then parse it using an XML parser.

When you've done that, you'll have a tree in which there is a text node (the child of the data element) containing the content:

<unitlist version="1"><unit version="1" unitid="%" abbreviation="%" name="Porcentaje" />
  <unit version="1" unitid="1/2 lb." abbreviation="1/2 lb." name="1/2 libra" /><unit version="1" unitid="1/2 pt." abbreviation="1/2 pt." name="medias pintas" /><unit version="1" unitid="1/2 pulg." abbreviation="1/2 pulg." name="1/2 pulg." /><unit version="1" unitid="1/2&quot; cdr." abbreviation="1/2" cdr." name="1/2 pulgada cuadrada" /></unitlist></data>

Which is XML... so you can put it through an XML parser, after which you will be able to use the usual XML APIs to access the content of attributes like @unitid.

Whoever designed this had a perverted sense of humour. It's the kind of weird mindset that uses "half a pound" as a unit of weight.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks for your feedback. I know it is horrible, it has been a nightmare. Can you help me providing a sample of a XSLT doing what you say. I have tried to wrap the data string into a variable using disable-output-escaping="yes" and it transforms to the XML you have mention. But know I cant figure out how to re process that into a new template, for some reason, when I pass the variable to a new template, it does not read it as XML, and the ugly text comes back again... – tomaspozo Jul 16 '17 at 22:13
0

I need to retrieve the data node, and parse every unit into something like this:

Your data provider does not want you to parse the given XML. They have gone into a lot of trouble to make it especially difficult for you. If you cannot persuade them to change their format, then I suggest you take the following steps:

Step 1: Apply the following stylesheet to the input XML:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns="http://tempuri.org/"
exclude-result-prefixes="ns">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/ns:Response">
    <result>
        <xsl:value-of select="ns:Result" disable-output-escaping="yes"/>
    </result>
</xsl:template>

</xsl:stylesheet>

and save the result to a file.

Step 2: Apply the following stylesheet to the file produced in step #1:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/result">
    <xsl:value-of select="data" disable-output-escaping="yes"/>
</xsl:template>

</xsl:stylesheet>

and save the result to a file.

Step 3: Apply the following stylesheet to the file produced in step #2:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/unitlist">
    <units>
        <xsl:for-each select="unit">
            <xsl:copy>
                <id>
                    <xsl:value-of select="@unitid" />
                </id>
                <name>
                    <xsl:value-of select="@name" />
                </name>
                <abbreviation>
                    <xsl:value-of select="@abbreviation" />
                </abbreviation>
            </xsl:copy>
        </xsl:for-each>
    </units>
</xsl:template>

</xsl:stylesheet>

For clarity, these are the results you should get in each step:

Step 1

<?xml version="1.0" encoding="UTF-8"?>
<result>

  <resultcode>0</resultcode>
  <message>OK</message>
  <data>&lt;unitlist version="1"&gt;&lt;unit version="1" unitid="%" abbreviation="%" name="Porcentaje" /&gt;
  &lt;unit version="1" unitid="1/2 lb." abbreviation="1/2 lb." name="1/2 libra" /&gt;&lt;unit version="1" unitid="1/2 pt." abbreviation="1/2 pt." name="medias pintas" /&gt;&lt;unit version="1" unitid="1/2 pulg." abbreviation="1/2 pulg." name="1/2 pulg." /&gt;&lt;unit version="1" unitid="1/2&amp;quot; cdr." abbreviation="1/2&amp;quot; cdr." name="1/2 pulgada cuadrada" /&gt;&lt;/unitlist&gt;</data>

    </result>

Step 2

<?xml version="1.0" encoding="UTF-8"?>
<unitlist version="1"><unit version="1" unitid="%" abbreviation="%" name="Porcentaje" />
  <unit version="1" unitid="1/2 lb." abbreviation="1/2 lb." name="1/2 libra" /><unit version="1" unitid="1/2 pt." abbreviation="1/2 pt." name="medias pintas" /><unit version="1" unitid="1/2 pulg." abbreviation="1/2 pulg." name="1/2 pulg." /><unit version="1" unitid="1/2&quot; cdr." abbreviation="1/2&quot; cdr." name="1/2 pulgada cuadrada" /></unitlist>

Step 3

<?xml version="1.0" encoding="UTF-8"?>
<units>
  <unit>
    <id>%</id>
    <name>Porcentaje</name>
    <abbreviation>%</abbreviation>
  </unit>
  <unit>
    <id>1/2 lb.</id>
    <name>1/2 libra</name>
    <abbreviation>1/2 lb.</abbreviation>
  </unit>
  <unit>
    <id>1/2 pt.</id>
    <name>medias pintas</name>
    <abbreviation>1/2 pt.</abbreviation>
  </unit>
  <unit>
    <id>1/2 pulg.</id>
    <name>1/2 pulg.</name>
    <abbreviation>1/2 pulg.</abbreviation>
  </unit>
  <unit>
    <id>1/2" cdr.</id>
    <name>1/2 pulgada cuadrada</name>
    <abbreviation>1/2" cdr.</abbreviation>
  </unit>
</units>

The alternative is to extract the data using string functions, which would be extremely tedious and error-prone.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thanks @michael.hor257k for such a great explanation. Could this steps be achieved in a single XSLT file? I am reading about XSL pipeline processing, and I wanted to ask you if your steps could be achieved using that approach so I have a single file for the transformation. The problem we have is that the program that invokes this service only has a single XSLT to specify the transformation. – tomaspozo Jul 17 '17 at 06:48
  • I am not sure I understand your question. The described process cannot be achieved using a single XSL **transformation**, because `disable-output-escaping` is applied only when the output is serialized, i.e. saved to a file. You *could* use a single XSLT file for the 3 transformations, but I don't see what difference that would make. You would still need to process 3 separate XML files. – michael.hor257k Jul 17 '17 at 07:18
  • Ok, I understand now, the problem would be with `disable-output-escaping`, it wont work. Thanks again for your help and feedback. – tomaspozo Jul 17 '17 at 17:41