0

I would like to ask if there is a function that can be use to to remove a duplicate value inside a string separated by | simplest possible way. I have below example of the string

1111-1|1111-1|1111-3|1111-4|1111-5|1111-3

the output that I'm expecting is:

1111-1|1111-3|1111-4|1111-5

Thanks in advance.

vic-rattlehead
  • 337
  • 1
  • 3
  • 14
  • 1
    Please always tag the question with the version of XSLT that you are using: `xslt-1.0`, `xslt-2.0` or `xslt-3.0`. – Mathias Müller Jan 21 '16 at 08:14
  • Which XSLT 1.0 processor are you using? – michael.hor257k Jan 21 '16 at 08:25
  • @michael.hor257k it is an in house processor not so very common to others – vic-rattlehead Jan 21 '16 at 09:05
  • 2
    @vic-rattlehead The question is which extension functions it supports; without tokenize() and distinct() this is not going to be simple. – michael.hor257k Jan 21 '16 at 09:12
  • @vic-rattlehead Aren't you going to answer the question? – michael.hor257k Jan 22 '16 at 08:31
  • @michael.hor257k I apologize I get busy with other things I need to apply the function first to find out whether or not it is supported. It's a bit complicated with the in-house technology that I'm using. I'll get back to you once I'm done. – vic-rattlehead Jan 22 '16 at 10:09
  • This depends largely on your definition of 'simplest'. To be frank, the simplest solution is not to use XSLT! If you have to use XSLT to process your data, you're far better off rearranging it into a more query-able form, like `1111-11111-3etc.` Storing multiple values in a single XML element or attribute is never a good idea. – Flynn1179 Jan 23 '16 at 22:50
  • @vic-rattlehead Please take the time to close this question, if it has been answered. – michael.hor257k Jul 05 '16 at 07:02

4 Answers4

2

To do this in pure XSLT 1.0, with no extension functions, you will need to use a recursive named template:

<xsl:template name="distinct-values-from-list">
    <xsl:param name="list"/>
    <xsl:param name="delimiter" select="'|'"/>          
    <xsl:choose>
        <xsl:when test="contains($list, $delimiter)">
            <xsl:variable name="token" select="substring-before($list, $delimiter)" />
            <xsl:variable name="next-list" select="substring-after($list, $delimiter)" />           
            <!-- output token if it is unique -->
            <xsl:if test="not(contains(concat($delimiter, $next-list, $delimiter), concat($delimiter, $token, $delimiter)))">
                <xsl:value-of select="concat($token, $delimiter)"/>
            </xsl:if>
            <!-- recursive call -->
            <xsl:call-template name="distinct-values-from-list">
                <xsl:with-param name="list" select="$next-list"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$list"/>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>

Full demo: http://xsltransform.net/ncdD7mM


Added:

The above method outputs the last occurrence of each value in the list, because that's the simplest way to remove the duplicates.

The side effect of this is that the original order of the values is not preserved. Or - more correctly - it is the reverse order that is being preserved.

I would not think preserving the original forward order is of any importance here. But in case you do need it, it could be done this way (which I believe is much easier to follow than the suggested alternative):

<xsl:template name="distinct-values-from-list">
    <xsl:param name="list"/>
    <xsl:param name="delimiter" select="'|'"/>    
    <xsl:param name="result"/> 
    <xsl:choose>
        <xsl:when test="$list">
            <xsl:variable name="token" select="substring-before(concat($list, $delimiter), $delimiter)" /> 
            <!-- recursive call -->
            <xsl:call-template name="distinct-values-from-list">
                <xsl:with-param name="list" select="substring-after($list, $delimiter)"/>
                <xsl:with-param name="result">
                    <xsl:value-of select="$result"/>
                    <!-- add token if this is its first occurrence -->
                    <xsl:if test="not(contains(concat($delimiter, $result, $delimiter), concat($delimiter, $token, $delimiter)))">
                        <xsl:value-of select="concat($delimiter, $token)"/>
                    </xsl:if>
                </xsl:with-param>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="substring($result, 2)"/>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Two corrections: 1) The order of the results of the 1st transformation is not the *reverse* of the original order as stated -- it is quite random! `1111-1|1111-4|1111-5|1111-3` is **not** the reverse of `1111-1|1111-3|1111-4|1111-5` The 2nd comment to follow ... – Dimitre Novatchev Jan 23 '16 at 19:33
  • Comment 2: The added new solution crashes with some XSLT processors on sufficiently long input. – Dimitre Novatchev Jan 23 '16 at 19:41
  • @DimitreNovatchev Well, point #1 is incorrect (the result you claim is not the actual result received) and point #2 is too vague to take seriously. – michael.hor257k Jan 23 '16 at 21:07
2

All presented XSLT 1.0 solutions so far produce the wrong result:

1111-1|1111-4|1111-5|1111-3

whereas the wanted, correct result is:

1111-1|1111-3|1111-4|1111-5

Now, the following transformation (no extensions, pure XSLT 1.0):

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>

  <xsl:template match="text()" name="distinctSubstrings">
    <xsl:param name="pText" select="."/>
    <xsl:param name="poutDelim"/>
    <xsl:param name="pFoundDistinctSubs" select="'|'"/>
    <xsl:param name="pCountDistinct" select="0"/>

    <xsl:if test="$pText">
      <xsl:variable name="vnextSub" select="substring-before(concat($pText, '|'), '|')"/>
      <xsl:variable name="vIsNewDistinct" select=
          "not(contains(concat($pFoundDistinctSubs, '|'), concat('|', $vnextSub, '|')))"/>
      <xsl:variable name="vnextDistinct" select=
      "substring(concat($poutDelim,$vnextSub), 1 div $vIsNewDistinct)"/>

      <xsl:value-of select="$vnextDistinct"/>

      <xsl:variable name="vNewFoundDistinctSubs" 
           select="concat($pFoundDistinctSubs, $vnextDistinct)"/>
      <xsl:variable name="vnextOutDelim" 
           select="substring('|', 2 - ($pCountDistinct > 0))"/>

      <xsl:call-template name="distinctSubstrings">
        <xsl:with-param name="pText" select="substring-after($pText, '|')"/>
        <xsl:with-param name="pFoundDistinctSubs" select="$vNewFoundDistinctSubs"/>
        <xsl:with-param name="pCountDistinct" select="$pCountDistinct + $vIsNewDistinct"/>
        <xsl:with-param name="poutDelim" select="$vnextOutDelim"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

when applied on this XML document (with string value the provided string in the question):

<t>1111-1|1111-1|1111-3|1111-4|1111-5|1111-3</t>

produces the wanted, correct result:

1111-1|1111-3|1111-4|1111-5

Explanation:

  1. All found distinct substrings are concatenated in the parameter $pFoundDistinctSubs -- whenever we get the next substring from the delimited input, we compare it to the distinct substrings passed in this parameter. This ensures that the first in order distinct substring will be output -- not the last as in the other two solutions.

  2. We use conditionless value determination, based on the fact that XSLT 1.0 implicitly converts a Boolean false() to 0 and true() to 1 whenever it is used in a context that requires a numeric value. In particular, substring($x, 1 div true()) is equivalent to substring($x, 1 div 1) that is: substring($x, 1) and this is the entire string $x. On the other side, substring($x, 1 div false()) is equivalent to substring($x, 1 div 0) -- that is: substring($x, Infinity) and this is the empty string.

To know why avoiding conditionals is important: watch this Pluralsight course:

Tactical Design Patterns in .NET: Control Flow, by Zoran Horvat

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
1

Assuming that you can use XSLT 2.0, and assuming that the input looks like

<?xml version="1.0" encoding="UTF-8"?>
<root>1111-1|1111-1|1111-3|1111-4|1111-5|1111-3</root>

you could use the distinct-values and tokenize functions:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <xsl:template match="/root">
      <result>
          <xsl:value-of separator="|" select="distinct-values(tokenize(.,'\|'))"/>
      </result>
    </xsl:template>

</xsl:transform>

And the result will be

<?xml version="1.0" encoding="UTF-8"?>
<result>1111-1|1111-3|1111-4|1111-5</result>
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
1

I have adapted a stylesheet below from (XSLT 1.0 How to get distinct values)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:output omit-xml-declaration="yes"/>

    <xsl:template match="/">
        <output>
            <xsl:call-template name="distinctvalues">
                <xsl:with-param name="values" select="root"/>
            </xsl:call-template>
        </output>
    </xsl:template>

    <xsl:template name="distinctvalues">
        <xsl:param name="values"/>
        <xsl:variable name="firstvalue" select="substring-before($values, '|')"/>
        <xsl:variable name="restofvalue" select="substring-after($values, '|')"/>
        <xsl:if test="not(contains($values, '|'))">
            <xsl:value-of select="$values"/>
        </xsl:if>
        <xsl:if test="contains($restofvalue, $firstvalue) = false">
            <xsl:value-of select="$firstvalue"/>
            <xsl:text>|</xsl:text>
        </xsl:if>
        <xsl:if test="$restofvalue != ''">
            <xsl:call-template name="distinctvalues">
                <xsl:with-param name="values" select="$restofvalue" />
            </xsl:call-template>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

with a sample input of:

<root>1111-1|1111-1|1111-3|1111-4|1111-5|1111-3</root>

and the output is

<output>1111-1|1111-4|1111-5|1111-3</output>

**** EDIT ****

per Michael's comment below, here is the revised stylesheet which uses a saxon extension:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:saxon="http://icl.com/saxon"
    exclude-result-prefixes="saxon"
    version="1.1">

    <xsl:output omit-xml-declaration="yes"/>

    <xsl:variable name="aaa">
        <xsl:call-template name="tokenizeString">
            <xsl:with-param name="list" select="root"/>
            <xsl:with-param name="delimiter" select="'|'"/>
        </xsl:call-template>
    </xsl:variable>

    <xsl:template match="/">
        <xsl:for-each select="saxon:node-set($aaa)/token[not(preceding::token/. = .)]">
            <xsl:if test="position() &gt; 1">
                <xsl:text>|</xsl:text>
            </xsl:if>
            <xsl:value-of select="."/>
        </xsl:for-each>
    </xsl:template>

    <xsl:template name="tokenizeString">
        <!--passed template parameter -->
        <xsl:param name="list"/>
        <xsl:param name="delimiter"/>
        <xsl:choose>
            <xsl:when test="contains($list, $delimiter)">
                <token>
                    <!-- get everything in front of the first delimiter -->
                    <xsl:value-of select="substring-before($list,$delimiter)"/>
                </token>
                <xsl:call-template name="tokenizeString">
                    <!-- store anything left in another variable -->
                    <xsl:with-param name="list" select="substring-after($list,$delimiter)"/>
                    <xsl:with-param name="delimiter" select="$delimiter"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:choose>
                    <xsl:when test="$list = ''">
                        <xsl:text/>
                    </xsl:when>
                    <xsl:otherwise>
                        <token>
                            <xsl:value-of select="$list"/>
                        </token>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

given an input of:

<root>cat|cat|catalog|catalog|red|red|wired|wired</root>

it outputs

cat|catalog|red|wired

and with this input:

<root>1111-1|1111-1|1111-3|1111-4|1111-5|1111-3</root>

the output is

1111-1|1111-3|1111-4|1111-5
Community
  • 1
  • 1
Joel M. Lamsen
  • 7,143
  • 1
  • 12
  • 14