0

I would like to use XSL 2.0 (saxon9he.jar) to split data into groups by position. In this sample, I try to split market products into bags with 4 items in each bag. My testing indicates that position() is in the scope of the parent. Such that potato is position 2 as a child of the vegetable department, rather than position 5 in my selection of products. I would like to base the groups on the position within the selection, not the position within the parent.

XML Dataset:

<market>
    <department name="fruit">
        <product>apple</product>
        <product>banana</product>
        <product>grape</product>
    </department>
    <department name="vegetable">
        <product>carrot</product>
        <product>potato</product>
        <product>squash</product>
    </department>
    <department name="paper">
        <product>plates</product>
        <product>napkins</product>
        <product>cups</product>
    </department>
    <department name="cloths">
        <product>shirts</product>
        <product>shorts</product>
        <product>socks</product>
    </department>
</market>

XSL Template:

<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" exclude-result-prefixes="xs fn">
    <xsl:output indent="no" method="text"/>

    <!-- place 4 items in each bag -->

    <xsl:template match="/">
        <xsl:for-each-group select="/market/department/product" 
             group-ending-with="/market/department/product[position() mod 4 = 0]">
            <xsl:variable name="file" 
                 select="concat('bags/bag',position(),'.txt')"/>
            <xsl:result-document href="{$file}">

                <xsl:value-of select="position()"/>
                <xsl:for-each select="current-group()">
                    <xsl:value-of select="."/>
                </xsl:for-each>

           </xsl:result-document>
        </xsl:for-each-group>
    </xsl:template>

</xsl:transform>

Resulting bag1.txt

1applebananagrapecarrotpotatosquashplatesnapkinscupsshirtsshortssocks

Resulting bag2.txt

file does not exist!

Expected bag1.txt

1applebananagrapecarrot

Expected bag2.txt

2potatosquashplatesnapkins

My debugging conclusions: It seems like position() is never 4 (each department only has 3 items) If I change mod 4 to mod 2 I get multiple bags, and bag 1 contains 2 items. but all others but the last one contain 3 items. each bag ends at the 2nd item of a department, all but the first bag include the last item of the previous department.

Resulting bag1.txt

1applebanana

Resulting bag1.txt

2grapecarrotpotato

Expected bag1.txt

1applebanana

Expected bag2.txt

2grapecarrot

This suggests to me that position() is related the the parent item, not to the selection. I would like position() to be related to the selection. From what I have researched, position() should be related to the selection. Like is is described in the answer here:

Final hint: position() does not tell you the position of the node within its parent. It tells you the position of the current node relative to the list of nodes you are processing right now.

Find the position of an element within its parent with XSLT / XPath

There is mention here that pattern expressions differ in their interpretation of scope compared to select expressions. After reading it, I don't know how to change my use of the pattern expression to achieve the behavior I'm expecting.

Using for-each-group for high performance XSLT

based on the behavior I currently observe: If I had 9 fruit, 4 vegetables and 20 paper products, and used mod 5 bag1 would contain the first 5 fruit products, bag2 would contain the last 4 fruit + 4 vegetables + the first 5 paper products.

The current behavior is not the behavior I am looking for.

Onceler
  • 67
  • 6

2 Answers2

1

Try using group-adjacent here, instead of group-ending-with

 <xsl:for-each-group select="/market/department/product" 
                     group-adjacent="floor((position() - 1) div 4)">

Or this...

 <xsl:for-each-group select="/market/department/product" 
                     group-adjacent="ceiling(position() div 4)">

So, group the items based on the integer division by 4 of their position.

Tim C
  • 70,053
  • 14
  • 74
  • 93
1

Tim C has already explained how to get the desired behavior; this is just a note to help you understand your error.

The position() function and the dynamic context

The position() function returns the position of an item within a given sequence whose identity is given by the context. The function often does return the position of an element among the children of its parent, but that is because in practice the rules for determining the dynamic context for the evaluation of XPath expressions often specify that the relevant sequence is the sequence of an element's child nodes. The position() function is not 'scoped' to the parent element as part of its definition.

The value of the position() function is the context position, which is defined as "the position of the context item within the sequence of items currently being processed". Like the context item, the context position (and the context size returned by last()) is part of the dynamic context within which an XPath expression is evaluated. In the evaluation of any non-atomic XPath expression, the dynamic context may be different for different subexpressions.

In particular, the XPath specification stipulates that "When an expression E1/E2 or E1[E2] is evaluated, each item in the sequence obtained by evaluating E1 becomes the context item in the inner focus for an evaluation of E2."

The expression in your group-ending-with attribute

In the expression /market/department/product[position() mod 4 = 0], the rule just quoted means that the expression product[position() mod 4 = 0] is evaluated separately for each item in the sequence /market/department'. That is, for eachdepartmentelement in that sequence, the expressionproduct[...]is evaluated. That right-hand expression in turn is equivalent tochild::product[...], so for each evaluation of the right-hand expression the sequence in question is the sequence of elements namedproductwhich are children of the currentdepartmentelement. Within the expressionproduct[position() mod 4 = 0], the same basic rule applies: the filter expression within square brackets is evaluated in the context given by the expressionproduct. As a consequence, the context position (the value returned byposition()) is the position of the currentproductelement among its sibling elements. Since nodepartmentelement in the input has as many as four children, the value ofposition()` is never greater than three and each filter expression evaluates to false, so the expression as a whole evaluates to the empty sequence.

A similar expression with a different value

In the expression (/market/department/product)[position() mod 4 = 0], by contrast, the filter expression is evaluated in the context of the sequence of all product elements in the document (strictly speaking, those with the specified path, which in this case is all the product elements in the document). Product elements which are children of different department elements are lumped into the same sequence and then the predicate is applied once for each element. The value of position() ranges from 1 to 12 and the overall expression selects the products with values carrot, napkins, and socks.

You can't simply use the second expression in your group-ending-with attribute, because it's not allowed (the attribute value must be a pattern, not a general XPath expression). And even if you could, there are other problems in the template which would then require repair.

But you should clear your mind of the notion that position() always and only denotes the position of a node among the children of its parent.

A simple arithmetic example

It may help to consider some expressions that involve no nodes at all.

The expression

(1 to 100)

denotes the sequence of natural numbers from 1 to 100, inclusive. I'll call that S1. The expression

(1 to 100) [position() mod 4 eq 0]

filters out of S1 everything except the ones whose context positions are evenly divisible by 4, so it denotes the sequence (4, 8, ..., 96, 100). I'll call this S2. If we append another filter expression, its context is given by the sequence S2, not by S1. So

(1 to 100) [position() mod 4 eq 0] [position() gt 23]

returns the sequence consisting of the 24th and 25th entries in sequence S2, namely (96, 100).

C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65