1

Given input documents that are a series of same-level nodes, I want to find those nodes that occur between two flags (which themselves are nodes). The flags can be used multiple times and the final outcome should have all the content between the same flags grouped together. I am striking out on this.

Given this input document:

<root>
    <p class="text">Hello world 1.</p>
    <p class="text">Hello world 2.</p>
    <p class="text">Hello world 3.</p>
    <p class="excerptstartone">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select -->
    <p class="text">Hello world 4.</p>
    <p class="text">Hello world 5.</p>
    <p class="text">Hello world 6.</p>
    <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select -->
    <p class="text">Hello world 7.</p>
    <p class="excerptstarttwo">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select -->
    <p class="text">Hello world 8.</p>
    <p class="excerptendtwo">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select -->
    <p class="text">Hello world 9.</p>
    <p class="excerptstartone">Dummy text for starting a new excerpt</p> <!-- this flag identifies the start of the nodes I want to select -->
    <p class="text">Hello world 10.</p>
    <p class="text">Hello world 11.</p>
    <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select -->
    <p class="text">Hello world 12.</p>
    <p class="text">Hello world 13.</p>
    <p class="text">Hello world 14.</p>
    <p class="text">Hello world 15.</p>
    <p class="text">Hello world 16.</p>
    <p class="text">Hello world 17.</p>
</root>

I want this output:

<root>
    <p class="excerptstartone">Dummy text</p>
    <p class="text">Hello world 4.</p>
    <p class="text">Hello world 5.</p>
    <p class="text">Hello world 6.</p>
    <p class="text">Hello world 10.</p>
    <p class="text">Hello world 11.</p>
    <p class="excerptendone">Dummy text</p>
    <p class="excerptstarttwo">Dummy text</p>
    <p class="text">Hello world 8.</p>
    <p class="excerptendtwo">Dummy text</p>
</root>

Note: The flags will always start with "excerptstart" and "excerptend" and the suffix of the flags will always match (that is, guaranteed by business rules there will always be a "excerptendone" if there is a "excerptstartone").

This is what I have so far. I can find the collections I want as long as I hard code the excerptstart suffix (i.e., 'one', 'two'). I am stuck on trying to generalize it so the suffix doesn't have to be hard coded (I should also say I don't care about retaining the start/end paragraph "flags" in the result tree; I've hard coded those here for convenience in assessing the result tree):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="root">
    <root>
        <p class="excerptstartone">Dummy text</p>
        <xsl:for-each select="p[@class='excerptstartone']">
           <xsl:sequence select="following-sibling::node() intersect following-sibling::p[@class='excerptendone'][1]/preceding-sibling::node()"/>   
       </xsl:for-each>
        <p class="excerptendone">Dummy text</p>
        <p class="excerptstarttwo">Dummy text</p>
        <xsl:for-each select="p[@class='excerptstarttwo']">
            <xsl:sequence select="following-sibling::node() intersect following-sibling::p[@class='excerptendtwo'][1]/preceding-sibling::node()"/>   
        </xsl:for-each>
        <p class="excerptendtwo">Dummy text</p>
    </root>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
user2379780
  • 89
  • 1
  • 5

3 Answers3

2

Have a look for e.g. this Kayessian method.

Or try this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:key name="kFollowing" match="p"
          use="generate-id(preceding-sibling::p[starts-with(@class, 'excerptstart')][1])"/>

<xsl:key name="kExcerptstart" match="p[starts-with(@class, 'excerptstart')]"  use="@class"/>

    <xsl:template match="/*">
        <xsl:copy>  
            <xsl:apply-templates select="p"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="p" />
 <xsl:template match="p[ generate-id() = generate-id( key( 'kExcerptstart', @class)[1])] ">
     <xsl:copy-of select="."/>
     <xsl:variable name="start" select="@class" />
     <xsl:for-each select=" key( 'kExcerptstart', $start) " >
      <xsl:variable name="end" select="following-sibling::p[starts-with(@class, 'excerptend')][1]"/>
      <xsl:variable name="ns1" select="following-sibling::*" />
      <xsl:variable name="ns2" select="$end/preceding-sibling::*" />
      <!--<xsl:value-of select="count($ns1)"/>,<xsl:value-of select="count($ns2)"/>-->
      <xsl:copy-of select="$ns1[count(.|$ns2) = count($ns2)]"/>
     </xsl:for-each>
     <xsl:copy-of select="following-sibling::p[starts-with(@class, 'excerptend')][1]"/>
 </xsl:template>
</xsl:stylesheet>

Which will genereat the following output:

<root>
  <p class="excerptstartone">Dummy text</p>
  <p class="text">Hello world 4.</p>
  <p class="text">Hello world 5.</p>
  <p class="text">Hello world 6.</p>
  <p class="text">Hello world 10.</p>
  <p class="text">Hello world 11.</p>
  <p class="excerptendone">Dummy text</p>
  <p class="excerptstarttwo">Dummy text</p>
  <p class="text">Hello world 8.</p>
  <p class="excerptendtwo">Dummy text</p>
</root>
Community
  • 1
  • 1
hr_117
  • 9,589
  • 1
  • 18
  • 23
  • Thanks, the link helped a lot. And +1 for an answer that works. Intersect is part of what I was looking for. I should've remembered that operator. I am having a problem generalizing my current solution though. – user2379780 Jun 15 '13 at 16:34
1

(I should also say I don't care about retaining the start/end paragraph "flags" in the result tree; I've hard coded those here for convenience in assessing the result tree)

Here is a simple solution, using just grouping:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
  <root>
     <xsl:for-each-group select=
     "p[@class eq 'text']
         [preceding-sibling::p[starts-with(@class, 'excerpt')][1]
             [starts-with(@class, 'excerptstart')]
         ]"
          group-by="preceding-sibling::p[starts-with(@class, 'excerpt')][1]/@class">

        <xsl:sequence select="current-group()"/>
     </xsl:for-each-group>
  </root>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<root>
    <p class="text">Hello world 1.</p>
    <p class="text">Hello world 2.</p>
    <p class="text">Hello world 3.</p>
    <p class="excerptstartone">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select -->
    <p class="text">Hello world 4.</p>
    <p class="text">Hello world 5.</p>
    <p class="text">Hello world 6.</p>
    <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select -->
    <p class="text">Hello world 7.</p>
    <p class="excerptstarttwo">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select -->
    <p class="text">Hello world 8.</p>
    <p class="excerptendtwo">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select -->
    <p class="text">Hello world 9.</p>
    <p class="excerptstartone">Dummy text for starting a new excerpt</p> <!-- this flag identifies the start of the nodes I want to select -->
    <p class="text">Hello world 10.</p>
    <p class="text">Hello world 11.</p>
    <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select -->
    <p class="text">Hello world 12.</p>
    <p class="text">Hello world 13.</p>
    <p class="text">Hello world 14.</p>
    <p class="text">Hello world 15.</p>
    <p class="text">Hello world 16.</p>
    <p class="text">Hello world 17.</p>
</root>

the wanted, correct result is produced:

<root>
   <p class="text">Hello world 4.</p>
   <p class="text">Hello world 5.</p>
   <p class="text">Hello world 6.</p>
   <p class="text">Hello world 10.</p>
   <p class="text">Hello world 11.</p>
   <p class="text">Hello world 8.</p>
</root>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Thank you, Dimitre. I had been trying to figure out how to do it with for-each-group as from the moment I needed to do this I thought that should be the solution. But I couldn't suss it out. I need to figure out how to enhance my thinking about/working with groups. Thanks again. – user2379780 Jun 16 '13 at 04:11
0

This provides a general, albeit a little clunky (due to the use of two for-eaches), solution for what I want to do:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="root">
    <root>
        <xsl:variable name="uniqueExcerptClasses" select="distinct-values(//@class[starts-with(.,'excerptstart')])"/>
        <xsl:variable name="context" select="."/>
        <xsl:for-each select="$uniqueExcerptClasses">
            <xsl:text>
        </xsl:text><p>start excert</p><xsl:text>
        </xsl:text>
            <xsl:variable name="curExcerpt" select="."/>
            <xsl:for-each select="$context/p[@class=$curExcerpt]">
               <xsl:sequence select="following-sibling::node() intersect following-sibling::p[@class=replace($curExcerpt,'start','end')][1]/preceding-sibling::node()"/>   
           </xsl:for-each>
            <xsl:text>
        </xsl:text><p>end excert</p><xsl:text>
        </xsl:text>
        </xsl:for-each>
    </root>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
user2379780
  • 89
  • 1
  • 5