Muenchian Grouping - group within a node, not within the entire document

Question

I'm trying to use Muenchian grouping in my XSLT to group matching nodes, but I only want to group within a parent node, not across the entire source XML document.

Given XSLT and XML as follows (apologies for the length of my sample code):

XSLT

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"> 
 <xsl:output method="html" indent="yes"/>

 <xsl:key name="contacts-by-surname" match="contact" use="surname" />
 <xsl:template match="records">
  <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[1]) = 1]">
   <xsl:sort select="surname" />
   <xsl:value-of select="surname" />,<br />
   <xsl:for-each select="key('contacts-by-surname', surname)">
    <xsl:sort select="forename" />
    <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
   </xsl:for-each>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

XML

<root>
 <records>
  <contact id="0001">
   <title>Mr</title>
   <forename>John</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0002">
   <title>Dr</title>
   <forename>Amy</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0003">
   <title>Mrs</title>
   <forename>Mary</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0004">
   <title>Ms</title>
   <forename>Anne</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0005">
   <title>Mr</title>
   <forename>Peter</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0006">
   <title>Dr</title>
   <forename>Indy</forename>
   <surname>Jones</surname>
  </contact>
 </records>
 <records>
  <contact id="0001">
   <title>Mr</title>
   <forename>James</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0002">
   <title>Dr</title>
   <forename>Mandy</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0003">
   <title>Mrs</title>
   <forename>Elizabeth</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0004">
   <title>Ms</title>
   <forename>Sally</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0005">
   <title>Mr</title>
   <forename>George</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0006">
   <title>Dr</title>
   <forename>Harry</forename>
   <surname>Jones</surname>
  </contact>
 </records>
</root>

RESULT

Jones,
Amy (Dr)
Anne (Ms)
Harry (Dr)
Indy (Dr)
Mandy (Dr)
Sally (Ms)

Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)
John (Mr)
Mary (Mrs)
Peter (Mr)

How do I group within each <records> and achieve this result:

Jones,
Amy (Dr)
Anne (Ms)
Indy (Dr)

Smith,
John (Mr)
Mary (Mrs)
Peter (Mr)

Jones,
Harry (Dr)
Mandy (Dr)
Sally (Ms)

Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)

Kristian, in your desired result, the forenames are not sorted within the surnames. I am assuming they should be since you are explicitly sorting on the forename in your xslt. — Rashmi Pandit, Nov 18 '09 at 05:41
Good point about the ordering - have updated question to have sorted forenames in the result. — kristian, Nov 18 '09 at 05:59

Rashmi Pandit · Accepted Answer · 2009-11-18T12:48:32.953

Took me some time ... I was about to give up but continued nevertheless :)

The drawback of the key function is that the key generated will always be for the entire xml. Hence you should concatenate additional information in your key to make it more specific. In the e.g. below, I am concatenating the position of records node, so that I get keys for distinct surnames per records.

Here's the xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="html" indent="yes"/>
  <xsl:key name="distinct-surname" match="contact" use="concat(generate-id(..), '|', surname)"/>
  <xsl:template match="records">
    <xsl:for-each select="contact[generate-id() = generate-id(key('distinct-surname', concat(generate-id(..), '|', surname))[1])]">
      <xsl:sort select="surname" />
      <xsl:value-of select="surname" />,<br />
      <xsl:for-each select="key('distinct-surname', concat(generate-id(..), '|', surname))">
        <xsl:sort select="forename" />
        <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
      </xsl:for-each>
    </xsl:for-each>
  </xsl:template>  
</xsl:stylesheet>

This is the result:

Jones,
Amy (Dr)
Anne (Ms)
Indy (Dr)
Smith,
John (Mr)
Mary (Mrs)
Peter (Mr)
Jones,
Harry (Dr)
Mandy (Dr)
Sally (Ms)
Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)

Please note that the result is sorted on the forenames too. If you don't want to sort it on forenames, you need to remove the line <xsl:sort select="forename" />

That's what I would have done, +1. I propose a tiny change: Instead of `concat(count(parent::*/preceding-sibling::*), surname)`, use `concat(generate-id(..), '|', surname)`. It's shorter, more efficient, and a bit safer because of the additional delimiter char. — Tomalak, Nov 18 '09 at 10:56
Tomalak, I have edited the xslt as per your suggestion. Thanks :) — Rashmi Pandit, Nov 18 '09 at 12:48

score 3 · Answer 2 · answered Nov 18 '09 at 07:39

3

There is simpler method, by adding a predicate which ensure than contacts involved in muench test are child of the current records.

<xsl:key name="contacts-by-surname" match="contact" use="surname" />
<xsl:template match="records">
  <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[generate-id(parent::records) = generate-id(current())][1]) = 1]">
   <xsl:sort select="surname" />
   <xsl:value-of select="surname" />,<br />
   <xsl:for-each select="key('contacts-by-surname', surname)[generate-id(parent::records) = generate-id(current()/parent::records)]">
    <xsl:sort select="forename" />
    <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
   </xsl:for-each>
  </xsl:for-each>
</xsl:template>

answered Nov 18 '09 at 07:39

Erlock

1,968
10
11

It may be simpler, but it also is less efficient. I would say that `contact[generate-id() = generate-id(…[…])]` is O(n²) in the worst case, while @Rashmi Pandit's `contact[generate-id() = generate-id(…)]` is O(n). – Tomalak Nov 18 '09 at 11:12
Maybe less efficient, but more robust I think. Concatening strings into compound keys implies that the separator string never occurs in any used string. I prefer deterministic behaviour over fastest run. :) – Erlock Nov 18 '09 at 13:46
Hm… I can think of a way that id-value is ambiguous (id `"key-30"`, value `"0"` vs. id `"key-300"`, value `""`), but for id-separator-value (it would be `"id-30|0"` vs. `"id-300|"`)? The presence of the separator in the value is not relevant, IMHO. Am I missing something? – Tomalak Nov 18 '09 at 15:10
id "0|1" & value "2", and id "0" & value "1|2" will produce the same key "0|1|2" with "|" as separator. I agree than in this peculiar case the id should not contain any "|" (generate-id() returns alphanumeric ASCII characters, according to the W3C XSLT specification) but the issue is the same if you use two or more values. In many cases compound keys with concatened strings are not safe, so I prefer not to use them at all as a "good practice". – Erlock Nov 18 '09 at 16:35

Muenchian Grouping - group within a node, not within the entire document

2 Answers2

Linked