How to count distinct values in a node?

Question

How to count distinct values in a node in XSLT?

Example: I want to count the number of existing countries in Country nodes, in this case, it would be 3.

<Artists_by_Countries>
    <Artist_by_Country>
        <Location_ID>62</Location_ID>
        <Artist_ID>212</Artist_ID>
        <Country>Argentina</Country>
    </Artist_by_Country>
    <Artist_by_Country>
        <Location_ID>4</Location_ID>
        <Artist_ID>108</Artist_ID>
        <Country>Australia</Country>
    </Artist_by_Country>
    <Artist_by_Country>
        <Location_ID>4</Location_ID>
        <Artist_ID>111</Artist_ID>
        <Country>Australia</Country>
    </Artist_by_Country>
    <Artist_by_Country>
        <Location_ID>12</Location_ID>
        <Artist_ID>78</Artist_ID>
        <Country>Germany</Country>
    </Artist_by_Country>
</Artists_by_Countries>

score 28 · Accepted Answer · answered Sep 30 '08 at 16:53

If you have a large document, you probably want to use the "Muenchian Method", which is usually used for grouping, to identify the distinct nodes. Declare a key that indexes the things you want to count by the values that are distinct:

<xsl:key name="artists-by-country" match="Artist_by_Country" use="Country" />

Then you can get the <Artist_by_Country> elements that have distinct countries using:

/Artists_by_Countries
  /Artist_by_Country
    [generate-id(.) =
     generate-id(key('artists-by-country', Country)[1])]

and you can count them by wrapping that in a call to the count() function.

Of course in XSLT 2.0, it's as simple as

count(distinct-values(/Artists_by_Countries/Artist_by_Country/Country))

My eyes were glazing over the top bunch of lines then i found the nugget at the end — Nicholas DiPiazza, Apr 25 '16 at 03:21

score 6 · Answer 2 · answered Sep 30 '08 at 14:16

6

In XSLT 1.0 this isn't obvious, but the following should give you an idea of the requirement:

count(//Artist_by_Country[not(Location_ID=preceding-sibling::Artist_by_Country/Location_ID)]/Location_ID)

The more elements in your XML the longer this takes, as it checks every single preceding sibling of every single element.

answered Sep 30 '08 at 14:16

samjudson

56,243
7
59
69

Not sure about performance, but for XSLT 1.0, this seems like a cleaner solution than requiring the element in the top voted solution. – Jay Stevens Jan 30 '13 at 21:02
There are usually at least two ways to skin a cat in XSLT - and which is best will depend on your particular circumstances. xsl:key can be very fast in a good processor on large documents compare to my method above I suspect. – samjudson Feb 03 '13 at 21:55

score 5 · Answer 3 · answered Sep 30 '08 at 14:13

5

Try something like this:

count(//Country[not(following::Country/text() = text())])

"Give me the count of all Country nodes without a following Country with matching text"

The interesting bit of that expression, IMO, is the following axis.

You could probably also remove the first /text(), and replace the second with .

answered Sep 30 '08 at 14:13

Chris Marasti-Georg

34,091
15
92
137

This will only work if the nodes are sorted and the like Country values are therefore consecutive. – dacracot Sep 30 '08 at 14:38
3

No, it will always work. following:: works on the entire document, if there is ANY country after the context one that has the same value, that node will not be counted. – Chris Marasti-Georg Sep 30 '08 at 14:59
This should be the accepted answer, although the 2.0 option is great for people that can use it. – Moss Jan 24 '18 at 06:38

score 0 · Answer 4 · answered Sep 30 '08 at 14:21

If you have control of the xml generation on the first occurence of a country you could add an attribute to the country node such as distinct='true' flag the country as "used" and not subsequently add the distinct attribute if you come across that country again.

You could then do

<xsl:for-each select="Artists_by_Countries/Artist_by_Country/Country[@distinct='true']" />

How to count distinct values in a node?

4 Answers4

Linked