4

I'm using XSLT to transform from one format of XML into another but I also need to do some value substitutions at the same time if possible. Can someone provide a solution to change a large number of values; e.g. "AppName" should be changed to "1", "AppNameTwo" to "2" and I'd ideally like to do this via some type of look-up lists within the XSLT:

<Application>
 <oldvalue="AppName" replacewith="1">
 <oldvalue="AppNameTwo" replacewith="2">
</Application>
<ResponseOne>
 <oldvalue="True" replacewith="Okay">
 <oldvalue="False" replacewith="Error">
</ResponseOne>

The only way I can currently think of doing this is instead via a number of many nested replaces?

Input

<Message>
  <Header>
    <Application>AppName</Application>
    <ResponseOne>True</ResponseOne>
    ...
</Header>
</Message>

XSLT so far

    <?xml version="1.0" encoding="utf-8"?>
        <xsl:stylesheet version="1.0">
        <xsl:template match="/">
        <n1:Message>
          <Header>
            <Application><xsl:value-of select="//Message/Organisation/Application/Name"/>   </Application>
   <Response><xsl:value-of select="//Message/Organisation/Application/ResponseOne"/>   </Response>
            ...
          </Header>
    </n1:Message>

Required Output

 <?xml version="1.0" encoding="utf-8"?>
        <n1:Message>
          <Header>
          <Application>1</Application>
          <Response>Error</Response>
            ...
          </Header>
    </n1:Message>

Intending to run this XSLT within Visual Studio 2010.

Jonathan
  • 419
  • 6
  • 15
  • Does the VS2010 XSLT engine support XSLT2, or are you stuck with XSLT1? – Jim Garrison Aug 23 '11 at 02:39
  • As far as I know only XSLT1. I'm intending to apply the stylesheet as per this questions answer: [SO:34093](http://stackoverflow.com/questions/34093/how-to-apply-an-xslt-stylesheet-in-c) – Jonathan Aug 23 '11 at 02:44
  • Good question, +1. See my answer for a short and simple solution that has only two templates, doesn't require any extension functions and can work with huge number of replacement rules without needing to modify the code. Extensive explanation is also provided. – Dimitre Novatchev Aug 23 '11 at 03:58

3 Answers3

1

Here's a tested example using exslt:node-set(), which should be available in the MS XML processor:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exslt="http://exslt.org/common"
    version="1.0">
    <xsl:variable name="tbl">
        <Application>
            <oldvalue val="AppName" replacewith="1"/>
            <oldvalue val="AppNameTwo" replacewith="2"/>
        </Application>
        <ResponseOne>
            <oldvalue val="True" replacewith="Okay"/>
            <oldvalue val="False" replacewith="Error"/>
        </ResponseOne>
    </xsl:variable>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="Application">
        <xsl:variable name="old" select="./text()"/>
        <xsl:copy><xsl:value-of select="exslt:node-set($tbl)/Application/oldvalue[@val=$old]/@replacewith"/></xsl:copy>
    </xsl:template>
</xsl:stylesheet>

This uses a variable containing the lookup table (I fixed your XML for the table), along with an identity transform that copies the input to the output. Then there's a template for the Application nodes that does the conversion using the lookup table. You need the exslt:node-set() function to convert the result tree fragment to a node-set that can be searched with XPath.

I've left the conversion of the <ResponseOne> tags for you to do. Hint: just make another template like the one for Application.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
1

This simple transformation (only a single template overriding the identity rule and no need for extension functions), allows the use of huge number of replacement rules, without the need to change the code at all. Another alternative is to specify the value of the global parameter $pReps outside of the transformation -- then this code can be even slightly simplified:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pReps">
  <elem name="Application">
   <replace>
     <this>AppName</this>
     <with>1</with>
   </replace>
   <replace>
     <this>AppNameTwo</this>
     <with>2</with>
   </replace>
  </elem>
  <elem name="ResponseOne">
   <replace>
     <this>True</this>
     <with>Okay</with>
   </replace>
   <replace>
     <this>False</this>
     <with>Error</with>
   </replace>
  </elem>
 </xsl:param>

 <xsl:variable name="vReps" select=
 "document('')/*/xsl:param[@name='pReps']"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
  <xsl:variable name="vNewVal" select=
   "$vReps/elem
       [@name=name(current()/..)]
              /replace[this = current()]
                 /with/text()
   "/>

   <xsl:copy-of select=
    "$vNewVal | self::text()[not($vNewVal)]"/>
 </xsl:template>
</xsl:stylesheet>

When applied on the provided XML document:

<Message>
  <Header>
    <Application>AppName</Application>
    <ResponseOne>True</ResponseOne>
    ...
</Header>
</Message>

the wanted, correct result is produced:

<Message>
   <Header>
      <Application>1</Application>
      <ResponseOne>Okay</ResponseOne>
    ...
</Header>
</Message>

Explanation:

  1. The identity rule (template) copies every node "as-is".

  2. The replacement rules are coded as a sequence of elem elements that are children of the global paramerte pReps. The structure and meaning of every elem element should be self-explanatory.

  3. There is a single template overriding the identity rule, that matches any text node. Within this template a possible new value is calculated as defined by the variable $vNewVal. This is either the empty node-set (in case the parent name of the current node and the string value of the current node are not matced by any replace/this value from the $pReps. Or, if matched, this is the with sibling of the matching replace/this value from the $pReps. Finally, either $vNewVal (if not empty) or the current node are copied.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Excellent; thanks very much! This works perfectly and its very portable for use in other situations. – Jonathan Aug 23 '11 at 06:59
0

Here's an approach using keys. The template is simple without complicated expressions and the replacement rules are kept seperate from the code. The rules can be extended without any code modification.

lookup.xml

<?xml version="1.0"?>
<replacements>
    <Application>
        <replace oldvalue="AppName" replacewith="1"/>
        <replace oldvalue="AppNameTwo" replacewith="2"/>
    </Application>
    <ResponseOne>
        <replace oldvalue="True" replacewith="Okay"/>
        <replace oldvalue="False" replacewith="Error"/>
    </ResponseOne>
</replacements>

input.xml

<?xml version="1.0"?>
<Message>
    <Header>
        <Application>AppName</Application>
        <ResponseOne>True</ResponseOne>
    </Header>
    <Header>
        <Application>AppNameTwo</Application>
        <ResponseOne>False</ResponseOne>
    </Header>
</Message>

transform.xsl

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>
    <xsl:variable name="lookup" select="document('lookup.xml')"/>
    <xsl:key name="MasterKey" match="/replacements/*/replace" use="concat(local-name(..), ':', @oldvalue)"/>
    <xsl:template match="/Message">
        <Message>
            <xsl:apply-templates select="$lookup"/>
            <xsl:apply-templates/>
        </Message>
    </xsl:template>
    <xsl:template match="replacements"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="Header/*">
        <xsl:variable name="ThisKey" select="concat(local-name(), ':', text())"/>
        <xsl:variable name="nodename" select="local-name()"/>
        <xsl:choose>
            <xsl:when test="$lookup/replacements/*[name() =$nodename]">
                <xsl:element name="{$nodename}">
                    <xsl:for-each select="$lookup/replacements[1]">
                        <xsl:value-of select="key('MasterKey', $ThisKey)/@replacewith"/>
                    </xsl:for-each>
                </xsl:element>
            </xsl:when>
            <xsl:otherwise>
                <xsl:copy-of select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

Output:

<?xml version="1.0" encoding="UTF-16"?>
<Message>
    <Header>
        <Application>1</Application>
        <ResponseOne>Okay</ResponseOne>
    </Header>
    <Header>
        <Application>2</Application>
        <ResponseOne>Error</ResponseOne>
    </Header>
</Message>
Richard A
  • 2,783
  • 2
  • 22
  • 34
  • Unfortunately, this substitutes text nodes with the empty string in all cases where there isn't a match with the lookup file! – Dimitre Novatchev Aug 23 '11 at 12:42
  • Thanks, Dimitre. I don't think the OP specified behavior when there was no match. To me an empty string would be preferable to returning the old text. – Richard A Aug 23 '11 at 21:20
  • Ah I understand you now, Dimitre, your comments wasn't very clear. I think you mean when an element is entirely missing from the lookup table and is meant to be copied as-is. I'll edit my xsl to handle that case. – Richard A Aug 23 '11 at 22:02
  • Not only when entirely missing, but also when its name is in the lookup table but its value isn't equal to any `@oldvalue`. – Dimitre Novatchev Aug 23 '11 at 22:13
  • And also, @Richard A, nobody would ever want a replace to delete the original when it isn't in the replacement list. Why do you expect that what is preferrable to *you* should matter to others? Do you have some additional points to share in favor of deleting the original, or was this just an impulse? – Dimitre Novatchev Aug 23 '11 at 22:23
  • In applications I have worked on where we have been translating data we have preferred to know when there is no translation value in the table as this is either an error in the data or a case that should be handled consiously. If it translates to null, or to a fixed value, then it is easy to write exception code to catch those cases. If it returns the old value then not only is it impossible to anticipate what might be returned, but it might be misleading. – Richard A Aug 24 '11 at 02:38
  • As a simple example, if Okay was passed in it would be returned as Okay when it may well have been erroneous input data. Of course the handling of these conditions should be specified and either case might be the desired solution, but to assume that returning the original value is preferable is naive. To say that 'nobody would ever want to ever want a replace to delete the original when it isn't in the replacement list' is a surprising comment. No, not an impulse, knowledge from experience. – Richard A Aug 24 '11 at 02:39
  • You don't understand: Your code deletes the original value even if the "*oldvalue*" in the lookup table doesn't match the current value. – Dimitre Novatchev Aug 24 '11 at 02:40
  • I'm clearly missing your point. I'm not trying to be obtuse. I thought what you were saying was bad was that if I passed in AppNameThree I should return AppNameThree rather than . – Richard A Aug 24 '11 at 02:55
  • Yes, this is exactly what I am saying. `replace($s, $t, $v)` means replace all `$t` in `$s` with `$v`. In cace when `$s` doesn't contain any `$t` the result is `$s` unchanged, not the empty string ! – Dimitre Novatchev Aug 24 '11 at 03:03
  • Great, Dimitre, I'm glad that I understand you. I'm sure your definition of replace() is correct. What I am saying is that if I am translating some data with a lookup table I would like to know that a value hasn't been translated, not just return the input data. The elements above are a prime example. They are being translated to numbers and yet a value not in the table will most likely not be a number. Downstream processing will give unexpected results because the data error was not caught at the time of translation. – Richard A Aug 24 '11 at 03:12
  • You dont know, @Richard A, what are the requirements for this. If there were such special requirements, the would have been defined in the question. Your answer has a destructive effect on the data just because you assume something that hasn't been specified. Certainly, this is deeply wrong. – Dimitre Novatchev Aug 24 '11 at 03:36
  • 1
    Dimitre, I accept that your solution of putting the input data into the output field is one possible solution. I do not accept that there is anything inherently wrong with flagging erroneous data by removing it. I think we both agree that the question does not say what to do with unmatched data. – Richard A Aug 24 '11 at 03:51
  • Destroying data *by default* is never a right solution, Richard. Even the military know this. Flagging is not the same as destroying. – Dimitre Novatchev Aug 24 '11 at 03:57
  • You are, of course, entitled to your opinion, Dimitre. I am not destroying the data, the original xml file exists. I could be equally dogmatic and state that passing erroneous data by default is never the right solution. I think that both possibilities have their place depending upon the context. (I'm not sure why the reference to the military is relevant, or whose military you are referring to.) I suspect that we will have to agree to disagree on this one. – Richard A Aug 24 '11 at 04:02
  • Hi Richard, thanks for your answer it provided some insight to me as well. In this case Dimitre got the answer spot on; I didn't want loose any data; only replace some values while passing thru the majority untouched. BTW I haven't downvoted anyone; all answers were useful to my learning. – Jonathan Aug 24 '11 at 04:24
  • Thanks Jonathan, I appreciate your feedback. – Richard A Aug 24 '11 at 04:28
  • 3
    Dimitre, the intolerance and dogmatism that you have expressed in your comments, not only here, is the type of attitude that will discourage others from contributing to SO. I have accepted that your opinion on how to handle this issue is equally valid to my own. You have not only rejected my opinion, but decided without any knowledge of my occupation or experience that I should not 'be involved with data processing at all'. I find this sort of arrogance irritating and destructive to open dialogue. – Richard A Aug 25 '11 at 03:01