Using for-each-group for high performance XSLT

Question

I have an XSLT (1.0) style sheet. It works with no problem. I want to make it to 2.0. I want to use xsl:for-each-group (and make it have high performance). It is possible? How? Please explain.

I have many places like

    <xsl:if test="test condition">
     <xsl:for-each select="wo:tent">
     <width aidwidth='{/wo:document/styles [@wo:name=current()/@wo:style-name]/@wo:width}'
</xsl:for-each>
    </xsl:if>

ADDED

<xsl:template match="wo:country">
            <xsl:for-each select="@*">
                <xsl:copy/>
            </xsl:for-each>
            <xsl:variable name="states" select="wo:pages[@xil:style = &quot;topstates&quot; or @xil:style = &quot;toppage-title&quot;]"/>
            <xsl:variable name="provinces" select="wo:pages[@xil:style = &quot;topprovinces&quot;]"/>
            <xsl:choose>
                <xsl:when test="$states">
                    <xsl:apply-templates select="$states[2]/preceding-sibling::*"/>
                    <xsl:apply-templates select="$states[2]" mode="states">
                        <xsl:with-param name="states" select="$states[position() != 0]"/>
                    </xsl:apply-templates>
                </xsl:when>
                <xsl:when test="$provinces">
                    <xsl:apply-templates select="$provinces[2]/preceding-sibling::*"/>
                    <xsl:apply-templates select="$provinces[2]" mode="provinces">
                        <xsl:with-param name="provinces" select="$provinces[position() != 2]"/>
                    </xsl:apply-templates>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates/>
                </xsl:otherwise>
            </xsl:choose>
    </xsl:template>

THE SOURCE

<?xml version="1.0" encoding="UTF-8"?>
<wo:country>
   some stuff
</wo:country>

Well does the stylesheet use XSLT 1.0 style Muechian grouping that could be replaced by `for-each-group`? Or does it use some other even more complicated XSLT 1.0 like approach like sibling recursion that could be replace by `for-each-group group-starting-with`? I am afraid that without seeing the XSLT code it is not possible to make suggestions on how to rewrite it. Your question sounds a bit as if using `for-each-group` is some magic to improve performance, I rather see it as a tool XSLT 2.0 offers to write less lines of code that are easier to understand and manage, to extend than XSLT 1.0. — Martin Honnen, Jul 15 '12 at 10:29
thank you martin. yes I have things like following-sibling in the sheet. and what can be replaced in XSLT 1.0 with for-each-group in XSLT 2.0? — Setinger, Jul 15 '12 at 10:59
See http://www.w3.org/TR/xslt20/#grouping-examples for samples demonstrating the grouping features in XSLT 2.0. I am not aware of any article showing how to rewrite XSLT 1.0 grouping samples. — Martin Honnen, Jul 15 '12 at 11:49
Thank you very much @Martin If my style sheet works as it is now will using for-each-group do any good? I mean will it give me some help? — Setinger, Jul 15 '12 at 11:55
It depends on the type of grouping. If group by common value, probably not much. If group by adjacent or starting-with, then yes. Of coarse there are lots of variables: scale of your project, XSLT vendor etc. — Sean B. Durkin, Jul 15 '12 at 12:12
Another advantage to re-writing to take advantage of XSLT 2.0 grouping constructs is that it may be easier to read and maintain. — Mads Hansen, Jul 15 '12 at 13:08
Yes thats good. I want the style sheet to be more and more maintainable. — Setinger, Jul 15 '12 at 13:23
@Setinger: Please, edit the question and provide a complete XML document (as small as possible) and the complete XSLT code (as small as possible) that performs the grouping. If you do this, then many people can give you a *specific*, corresponding XSLT 2.0 transformation that uses `xsl:for-each-group` to produce the same result. In case in your current XSLT 1.0 code you are using the Muenchian method for grouping, do not expect the XSLT 2.0 transformation to be faster -- the Muenchian method is verry efficient (has sublinear time complexity -- sometimes very close to O(1)). — Dimitre Novatchev, Jul 15 '12 at 15:35
@Setinger:Yes, what I recommend is that you need to edit the question and put a complete (but short) code example there -- the comment format is not convenient for sharing code. In its current form the question is too-general and vague and it will have only general answers. If you present the code, then many people will have enough information to tell whether this is worth rewriting (as efficiency is concerned) using `xsl:for-each-group` or not. — Dimitre Novatchev, Jul 15 '12 at 17:42
@Setinger: No, you have provided a small fragment of the XSLT code and *no* source XML document. Please, do so. You only need to provide a complete (but small) example that demonstrates the way your code handles grouping. From what you already provided, it can be seen that you *don't* use Muenchian grouping -- this means that rewriting the code in XSLT 2.0 and using `xsl:for-each-group` may result in more efficient code (as will rewriting the code to use Muenchian grouping). But you need to provide a *complete* example. You also need to read about and understand `xsl:for-each-group`. — Dimitre Novatchev, Jul 16 '12 at 05:31
@DimitreNovatchev yes I went through good stuff by Sean and Martin. Can you tell me how I can use xsl:for-each-group as you suggest to make this more efficient. I dont think I have Muenchian grouping in my code. It is very long. Thats why I didnt put all here. ^_^ — Setinger, Jul 16 '12 at 18:37
@DimitreNovatchev I added more. Can you tell how to improve the performance? Is it good as it is now? — Setinger, Jul 17 '12 at 10:00
Setinger: Your big problem is that you haven't provided the source XML document (small, please), and you only provided one template of the complete transformation. You haven't shown at all what the wanted result must be. Please, *edit* the question and provide this important, necessary information. — Dimitre Novatchev, Jul 17 '12 at 11:50

score 8 · Accepted Answer · edited Oct 18 '17 at 23:26

I have assumed that you want an in-depth description of xsl:for-each-group and how to use it. If this is not what you are asking for, then please let me know.

The instruction, new in XSLT 2.0, takes a set of items and groups them. The set of items is called "the population", and the groups are just called groups. The instruction processes each group in turn.

Possible attributes of the xsl:for-each-group instruction include:

select
group-by
group-adjacent
group-starting-with
group-ending-with
collation

@select is mandatory. The others are optional. It can take any number of xsl:sort children (but they must come first), followed by a sequence constructor. A "sequence constructor" is the term for all the sequence emitting type instructions that go inside templates and the like.

@select

The select attribute specifies an XPATH expression which evaluates to the population to be grouped.

@group-by

The group-by attribute specifies an XPATH expression, which you use when the type of grouping is by common value. Every item in the population that evaluates to the same group-by value as another is in the same group as that other.

XSLT 1.0 Muenchian grouping is not too difficult when the type of grouping is group by common value. There are two more common forms of grouping: group adjacent items by similar value; and group an adjacent group of items whose group is either demarcated at the end or the at the beginning by some test. While both these forms of grouping are still possible with Muenchian, it becomes relatively complex. Muenchian on these types will also be less efficient at scale, because of the use of sibling axises (however you spell that!).

Another advantage of XSLT 2.0 that comes to mind is that Muenchian only works on node sets, whereas xsl:for-each-group is broader in application because it works on a sequence of items, not just nodes.

The result of the @group-by expression will be a sequence of items. This sequence is atomized and de-duped. The population item being tested will be a member of one group per value. It's a strange consequence, that with @group-by, and item may be a member of more than one group, or perhaps even none. Although I suspect that any thing that you can do in XSLT 2.0, you can, by some tortuous path, do in XSLT 1.0, the ability to put an item into two groups is something that would be quiet fiddly to do in XSLT 1.0 Muenchian.

@group-adjacent

The attributes group-by, group-adjacent, group-starting-with and group-ending-with are mutually exclusive because they specify different kinds of grouping. Items with commons values and adjacent in the population are grouped together. Unlike @group-by, @group-adjacent must evaluate to, after atomization, a single atomic value.

group-starting-with

Unlike select, group-adjacent and group-by, this attribute does not specify an XPATH select expression, but rather a pattern, in the same way the xsl:template/@match specifies a pattern, not a selection. If an item in the population passes the pattern test or is the first item in the population then it starts a new group. Otherwise the item continues the group from the previous item.

Martin mentioned the spec examples (w3.org/TR/xslt20/#grouping-example). From that reference, I am going to copy the example entitled "Identifying a Group by its Initial Element", but alter it slightly to emphasis the point about the initial item of the population.

So this is our input document (copied from w3 spec. The inclusion of the orphaned line is mine) ...

<body>
  <p>This is an orphaned paragraph.</p>
  <h2>Introduction</h2>
  <p>XSLT is used to write stylesheets.</p>
  <p>XQuery is used to query XML databases.</p>
  <h2>What is a stylesheet?</h2>
  <p>A stylesheet is an XML document used to define a transformation.</p>
  <p>Stylesheets may be written in XSLT.</p>
  <p>XSLT 2.0 introduces new grouping constructs.</p>
</body>

... what we want to do is define groups as nodes starting with h2 and include all the following p up until the next h2. The example solution given by w3 is to use @group-starting-with ...

<xsl:template match="body">
  <chapter>
        <xsl:for-each-group select="*" group-starting-with="h2"      >
          <section title="{self::h2}">
            <xsl:for-each select="current-group()[self::p]">
              <para><xsl:value-of select="."/></para>
            </xsl:for-each> 
          </section>
        </xsl:for-each-group>
  </chapter>
</xsl:template>

In the spec example, when the input does not contain an orphan line, this produces the desired result ...

<chapter>
  <section title="Introduction">
    <para>XSLT is used to write stylesheets.</para>
    <para>XQuery is used to query XML databases.</para>
  </section> 
  <section title="What is a stylesheet?">
    <para>A stylesheet is an XML document used to define a transformation.</para>
    <para>Stylesheets may be written in XSLT.</para>
    <para>XSLT 2.0 introduces new grouping constructs.</para>
  </section>
</chapter>

Although in our particular case we get instead ...

<chapter>
   <section title="">
      <para>This is an orphaned paragraph.</para>
   </section>
   <section title="Introduction">
      <para>XSLT is used to write stylesheets.</para>
      <para>XQuery is used to query XML databases.</para>
   </section>
   <section title="What is a stylesheet?">
      <para>A stylesheet is an XML document used to define a transformation.</para>
      <para>Stylesheets may be written in XSLT.</para>
      <para>XSLT 2.0 introduces new grouping constructs.</para>
   </section>
</chapter>

If the initial section for the orphaned lines is undesired, there are easy solutions. I won't go into them now. My point is just to high-light the fact that the first group resulting from @group-starting-with can be an 'orphan' group. By 'orphan', I mean a group whose head node does not fit the specified pattern.

@collation

The collation attribute specifies a collation URI and identifies a collation used to compare strings for equality.

current-group()

Within the xsl:for-each-group the current-group() function returns the current group being processed as a sequence of items.

current-grouping-key()

Within the xsl:for-each-group the current-group() function returns the current group key. I am not sure, but I believe that this can only be an atomic type. Also not sure, but I believe that this function is only applicable to @group-by and @group-adjacent type of grouping.

@group-by versus @group-adjacent

In some scenarios you will have a choice between these two sort types with the same functional result. When this is the case @group-adjacent is to be preferred over @group-by, because it will likely be more efficient to process.

Pattern versus Select

Some XSLT 2.0 instruction attributes contain select expressions. Michael Kay calls these "XPath expressions". Personally, when juxtaposing against patterns, I feel a better description would be "select expression". Other attributes contain patterns or "match expressions". While these two both contain the same syntax, they are very different beasts. The similarity between the two often makes XSLT beginners think of xsl:template/@match not as a pattern, but as a select expression. The consequence has been a lot of confusion from beginners about the value of the position() function within template's sequence constructors. As stated earlier, in xsl:for-each-group, @select, @group-by and @group-adjacent are select expressions, but @group-starting-with and @group-ending-with are patterns. So here is the difference:

Select expressions are a like a function. The input is a context document, context sequence, context item, context position and of course the actual expression. The output is a sequence of items. Depending where this is actually used, this could become the next context sequence. The default axis is child:: .
Unlike select expression, the default axis for a pattern is self:: . The pattern is also like a function. Its inputs are as before, and its output is not a sequence, but a boolean. Some item is being tested to see if it matches the pattern or not. The item being tested is made the context item. The match expression is temporarily evaluated as it were a select expression. Then the returned sequence is tested to see if the context item is a member or not. The returned sequence is then discarded. The result is true or 'match' if it was a member, and false otherwise.

thank you very much Sean. can you tell me how things in a xslt 1.0 style sheet can be changed to use xslt:for-each-group? — Setinger, Jul 15 '12 at 11:59
Thank you very much Sean. ^_^ I give +1 for this great explanation. This is very good stuff. I want to see a grouping in XSLT 1.0 and then same thing done in XSLT 2.0. Please help if I dont bother you too much. — Setinger, Jul 16 '12 at 18:34

score 2 · Answer 2 · answered Jul 16 '12 at 08:08

2

Sean has provided a wonderful overview of xsl:for-each-group, which was very generous, but it doesn't really seem to be an answer to your question.

You've shown a fragment of XSLT code, and you've said you want faster performance. But the fragment you showed is not doing grouping, it is doing a join. There are two ways you can speed up a join. Either use an XSLT processor such as Saxon-EE that does automatic join optimization, or optimize it by hand using keys. For example, given this expression:

/wo:document/styles [@wo:name=current()/@wo:style-name]/@wo:width

you could define a key

<xsl:key name="style-name-key" match="styles" use="@wo:name"/>

and then replace the expression by

key('style-name-key', @wo:style-name)/@wo:width

answered Jul 16 '12 at 08:08

Michael Kay

156,231
11
92
164

thank you Michael. I am using Saxon. If I add keys too, will it make the code even better? And is there a like thing for grouping too? – Setinger Jul 16 '12 at 18:29
If you're using Saxon-EE, then the -explain output will help you determine whether the optimizer has found a way of doing an indexed join (though the output isn't easy to digest). Whether manual optimization using keys can beat the automatic optimization will vary on a case-by-case basis. For grouping, Saxon's for-each-group implementation will usually be at least as fast as anything you can write by hand. – Michael Kay Jul 17 '12 at 13:55
thank you Michael. So I dont worry about doing it manually thanks to Saxon. ^_^ – Setinger Jul 17 '12 at 16:12