3

I went through XSLT Grouping Examples and Using for-each-group for high performance XSLT . I have a problem with for-each-group.

My XML

<?xml version="1.0" encoding="UTF-8"?>
<body>
   <p name="h-title" other="main">Introduction</p>
   <p name="h1-title " other="other-h1">XSLT and XQuery</p>
   <p name="h2-title" other=" other-h2">XSLT</p>
   <p name="">
      <p1 name="bold"> XSLT is used to write stylesheets.</p1>
   </p>
   <p name="h2-title " name="other-h2">XQuery</p>
   <p name="">
      <p1 name="bold"> XQuery is used to query XML databases.</p1>
   </p>
   <p name="h3-title" name="other-h3">XQuery and stylesheets</p>
   <p name="">
      <p1 name="bold"> XQuery is used to query XML databases.</p1>
   </p>
   <p name="h1-title " other="other-h1">XSLT and XQuery</p>
   <p name="h2-title " other=" other-h2">XSLT</p>
</body>

My Wanted Output

<?xml version="1.0" encoding="UTF-8"?>
<body>
   <p name="h-title " other="main">Introduction</p>
   <h1>
      <p name="h1-title " other="other-h1"> XSLT and XQuery </p>
      <h2>
         <p name="h2-title " other="other-h2">XSLT</p>
         <p name="">
            <p1 name="bold">XSLT is used to write stylesheets.
            </p1>
         </p>
      </h2>
      <h2>
         <p name="h2-title " other="other-h2"> XQuery is used to query XML databases    
         </p>
         <p name="">
            <p name="bold"> XQuery is used to query XML databases.</p>
         </p>
         <h3>
            <p name="h3-title " name="other-h3">XQuery and stylesheets</p>
            <p name="">
         <p1 name="bold"> XQuery is used to query XML databases.</p1>
           </p>
        </h3>
      </h2>
</h1>

<h1>
            <p name="h1-title " other="other-h1">XSLT and XQuery</p>
       <h2>   
            <p name="h2"-title other=" other-h2">XSLT</p>
       </h2>
</h1>
</body>

I tried with this. (not working)

<xsl:template match="body">


        <body>
            <xsl:for-each-group select="*" group-starting-with="@h1-title"      >
                <h1>
                    <xsl:for-each select="current-group()[self:: h1-title]">
                        <xsl:value-of select="."/> 
                        </xsl:for-each> 
                </h1>
            </xsl:for-each-group>

            <xsl:for-each-group select="*" group-starting-with="@h2-title"      >
                <h2>
                    <xsl:for-each select="current-group()[self::h2-title/@h2-title]">
                        <xsl:value-of select="."/>
                    </xsl:for-each> 
                </h2>
            </xsl:for-each-group>

            <xsl:for-each-group select="*" group-starting-with="@h3-title"      >
                <h3>
                    <xsl:for-each select="current-group()[self::h2-title/@h3-title]">
                        <xsl:value-of select="."/>
                    </xsl:for-each> 
                </h3>
            </xsl:for-each-group>

        </body>

  </xsl:template>

Will someone show me the correct way to get my wanted result?

Community
  • 1
  • 1
Setinger
  • 169
  • 3
  • 11

3 Answers3

5

Here is an XSLT 2.0 stylesheet using for-each-group in a recursive function (I prefer that to a named template with XSLT 2.0):

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="xs mf">

<xsl:param name="prefix" as="xs:string" select="'h'"/>
<xsl:param name="suffix" as="xs:string" select="'-title'"/>

<xsl:output method="html" version="4.0" indent="yes"/>

<xsl:function name="mf:group" as="node()*">
  <xsl:param name="items" as="node()*"/>
  <xsl:param name="level" as="xs:integer"/>
  <xsl:for-each-group select="$items" group-starting-with="p[@name = concat($prefix, $level, $suffix)]">
    <xsl:choose>
      <xsl:when test="not(self::p[@name = concat($prefix, $level, $suffix)])">
        <xsl:apply-templates select="current-group()"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:element name="h{$level}">
          <xsl:apply-templates select="."/>
          <xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
        </xsl:element>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each-group>
</xsl:function>

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="body">
  <xsl:copy>
    <xsl:sequence select="mf:group(*, 1)"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

When I apply that stylesheet with Saxon 9 to the input

<body>
   <p name="h-title" other="main">Introduction</p>
   <p name="h1-title" other="other-h1">XSLT and XQuery</p>
   <p name="h2-title" other=" other-h2">XSLT</p>
   <p name="">
      <p1 name="bold"> XSLT is used to write stylesheets.</p1>
   </p>
   <p name="h2-title" other="other-h2">XQuery</p>
   <p name="">
      <p1 name="bold"> XQuery is used to query XML databases.</p1>
   </p>
   <p name="h3-title" other="other-h3">XQuery and stylesheets</p>
   <p name="">
      <p1 name="bold"> XQuery is used to query XML databases.</p1>
   </p>
   <p name="h1-title" other="other-h1">XSLT and XQuery</p>
   <p name="h2-title" other=" other-h2">XSLT</p>
</body>

I get the result

<body>
   <p name="h-title" other="main">Introduction</p>
   <h1>
      <p name="h1-title" other="other-h1">XSLT and XQuery</p>
      <h2>
         <p name="h2-title" other=" other-h2">XSLT</p>
         <p name="">

            <p1 name="bold"> XSLT is used to write stylesheets.</p1>

         </p>
      </h2>
      <h2>
         <p name="h2-title" other="other-h2">XQuery</p>
         <p name="">

            <p1 name="bold"> XQuery is used to query XML databases.</p1>

         </p>
         <h3>
            <p name="h3-title" other="other-h3">XQuery and stylesheets</p>
            <p name="">

               <p1 name="bold"> XQuery is used to query XML databases.</p1>

            </p>
         </h3>
      </h2>
   </h1>
   <h1>
      <p name="h1-title" other="other-h1">XSLT and XQuery</p>
      <h2>
         <p name="h2-title" other=" other-h2">XSLT</p>
      </h2>
   </h1>
</body>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • This is really great. Does exactly what I want. Other two answers are also very great. I learn a lot from this question. Thank you very much. ^_^ – Setinger Jul 21 '12 at 14:51
3

This transformation uses keys and handles h1-title to h6-title:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:template match="body">
       <xsl:apply-templates select="p[@name='h1-title']" />
     </xsl:template>

     <xsl:key name="next-headings" match="p[@name='h6-title']"
       use="generate-id(preceding-sibling::p
                         [ @name='h1-title'
                        or @name='h2-title'
                        or @name='h3-title'
                        or @name='h4-title'
                        or @name='h5-title'
                        ][1])" />
     <xsl:key name="next-headings" match="p[@name='h5-title']"
       use="generate-id(preceding-sibling::p
                         [ @name='h1-title'
                        or @name='h2-title'
                        or @name='h3-title'
                        or @name='h4-title'
                        ][1])" />
     <xsl:key name="next-headings" match="p[@name='h4-title']"
       use="generate-id(preceding-sibling::p
                         [ @name='h1-title'
                        or @name='h2-title'
                        or @name='h3-title'
                        ][1])" />
     <xsl:key name="next-headings" match="p[@name='h3-title']"
       use="generate-id(preceding-sibling::p
                         [  @name='h1-title'
                        or @name='h2-title'
                        ][1])" />
     <xsl:key name="next-headings" match="p[@name='h2-title']"
       use="generate-id(preceding-sibling::p
                         [@name='h1-title'][1])" />

     <xsl:key name="immediate-nodes" match=
     "node()[not(self::p)
           or
            not(contains('|h1-title|h2-title|h3-title|h4-title|h5-title|h6-title|',
                         concat('|',@name,'|')
                        )
                )]"
       use="generate-id(preceding-sibling::p
             [contains('|h1-title|h2-title|h3-title|h4-title|h5-title|h6-title|',
                       concat('|',@name,'|')
                       )
             ][1])" />

     <xsl:template match=
      "p[contains('|h1-title|h2-title|h3-title|h4-title|h5-title|h6-title|',
                  concat('|',@name,'|')
                  )]">
       <xsl:variable name="vLevel" select="substring(@name,2,1)" />
       <xsl:element name="h{$vLevel}">
          <xsl:copy-of select="."/>
          <xsl:apply-templates select="key('immediate-nodes', generate-id())" />
          <xsl:apply-templates select="key('next-headings', generate-id())" />
       </xsl:element>
     </xsl:template>

     <xsl:template match="/*/node()" priority="-20">
       <xsl:copy-of select="." />
     </xsl:template>
</xsl:stylesheet>

When applied on this XML document (corrected the provided one and usin uniform values for the name attribute):

<body>
        <p name="h1-title" other="main">Introduction</p>
        <p name="h2-title" other="other-h2">XSLT and XQuery</p>
        <p name="h3-title" other=" other-h3">XSLT</p>
        <p name="">
                <p1 name="bold"> XSLT is used to write stylesheets.</p1>
        </p>
        <p name="h2-title" other="other-h2">XQuery</p>
        <p name="">
                <p1 name="bold"> XQuery is used to query XML databases.</p1>
        </p>
        <p name="h3-title" other="other-h3">XQuery and stylesheets</p>
        <p name="">
                <p1 name="bold"> XQuery is used to query XML databases.</p1>
        </p>
        <p name="h1-title" other="other-h1">XSLT and XQuery</p>
        <p name="h2-title" other=" other-h2">XSLT</p>
</body>

the wanted, correct result is produced:

<h1>
   <p name="h1-title" other="main">Introduction</p>
   <h2>
      <p name="h2-title" other="other-h2">XSLT and XQuery</p>
      <h3>
         <p name="h3-title" other=" other-h3">XSLT</p>
         <p name="">
            <p1 name="bold"> XSLT is used to write stylesheets.</p1>
         </p>
      </h3>
   </h2>
   <h2>
      <p name="h2-title" other="other-h2">XQuery</p>
      <p name="">
         <p1 name="bold"> XQuery is used to query XML databases.</p1>
      </p>
      <h3>
         <p name="h3-title" other="other-h3">XQuery and stylesheets</p>
         <p name="">
            <p1 name="bold"> XQuery is used to query XML databases.</p1>
         </p>
      </h3>
   </h2>
</h1>
<h1>
   <p name="h1-title" other="other-h1">XSLT and XQuery</p>
   <h2>
      <p name="h2-title" other=" other-h2">XSLT</p>
   </h2>
</h1>

Do note:

This transformation solves the main problem of generating the hierarchy. Only trivial changes are needed if it is required that the top level name attribute has the value "h-title".

If more hierarchy levels are necessary, this requires only mechanical adding the corresponding or clauses to the definition of the keys and appending the pipe-delimited string of all name attributes' values with the corresponding new strings.

Here I have adapted and re-used a solution that Jeni Tennison gave for a similar problem.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Thank you very much Dimitre. I wish to use xsl:for-each-group. I am trying it still. I will tell if there is a problem. – Setinger Jul 21 '12 at 04:48
  • @Setinger: If I were in your place I would take the key-based solution -- simply because it works and is one of the most efficient solutions (sublinear, close to O(1)). What I mean is that `xsl:for-each-group` most probably wouldn't be more efficient. – Dimitre Novatchev Jul 21 '12 at 05:09
  • Dimitre, do you think using a recursive call(as Michael says in the other answer) will be less efficient than this? is that so? – Setinger Jul 21 '12 at 05:11
  • @Setinger: As a rule, a recursive solution (for big size of the input) is most often less efficient than a non-recursive one -- in memory usage and also in actual execution time. Often a primitive recursion results in stack overflow exception (typically for N >= 1000) and to avoid this, one has to write a tail-recursive solution (and hope that the XSLT processor will be able to optimize this to iteration) or use DVC-recursion, which sometimes is difficult to write. – Dimitre Novatchev Jul 21 '12 at 05:15
  • @Setinger: I am going to bed now -- would be glad to continue this conversation tomorrow. – Dimitre Novatchev Jul 21 '12 at 05:17
  • what if I have so many like h1, h2, h3, h4, ... , etc. (about 20) then what would be good? okay, we will continue this tomorrow. thank you. ^_^ – Setinger Jul 21 '12 at 05:29
  • @Setinger: As I said, both: using keys and `xsl:for-each-group` are equally good. I personally would take immediately whatever solution of these two is available and will use it immediately, without waiting for a solution of the other type to become available. – Dimitre Novatchev Jul 21 '12 at 05:55
  • Thank you very much for this answer and the explanations Dimitre. I learnt a lot here. ^_^ – Setinger Jul 21 '12 at 14:53
  • @Setinger: You are welcome. Do note that this answer doesn't make any assumptions that all levels will be "adjacent" -- that is we don't assume that the next level for H1 is h2 -- this answer succsessfully works for a sequence of levels where for every next hN, N is just bigger than the current level number. Thus, if the levels are: h1, h3 and h7 -- this solution still handles them OK. The other answers don't seem to cover such scenario. – Dimitre Novatchev Jul 21 '12 at 15:00
  • Yes I just now understood that. Luckily so far I have adjacent things. But I do not know about future things. Good you told it now or I would have found it hard to get that. Thank you. ^_^ – Setinger Jul 21 '12 at 15:06
  • @Setinger: I will expect more, nice and challenging, questions from you in the future :) – Dimitre Novatchev Jul 21 '12 at 15:26
2

Each of your grouping steps is taking the original set of elements as input, whereas you need each step to work on the groups produced by the previous grouping step. And there are lots of other errors too, for example h1-title is not an attribute name.

It needs to be something like this:

<xsl:for-each-group select="*" group-starting-with="*[@name='h1-title']">
<h1>
  <xsl:choose>
    <xsl:when test="@name='h1-title'">
      <xsl:for-each-group select="current-group()" group-starting-with="*[name='h2-title']">
        <xsl:choose>
        <h2>
          ... similar logic for the next level ...
        </h2>
        </xsl:choose>
      </xsl:for-each-group>
    </xsl:when>
    <xsl:otherwise>
      <xsl:copy-of select="current-group()"/>
    </xsl:otherwise>
  </xsl:choose>
</h1>
</xsl:for-each-group>

You can nest that as deeply as you want depending how many levels you want to handle; or if you want to handle an indefinite number, you can put the code in a named template and make a recursive call to handle the next level. At the innermost level, leave out the xsl:choose and just do xsl:copy-of select="current-group().

(I just noticed the trailing spaces in the "name" attribute. If these really exist, you will need to include them in the comparison test, or do normalize-space() to get rid of them.)

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • you are awake. ^_^ thank you very much Michael. The white spaces are not there. There will be so many for-each-group's nested as much as I have like h1, h2, h3, h4, etc. I think other way you tell will be great. I will try that and tell you if I find problems. +1 for the proposal. ^_^ – Setinger Jul 20 '12 at 20:05
  • Thank you Michel. I go with Martin's answer.. ^_^ – Setinger Jul 21 '12 at 14:54