This is tricky, but doable (long read ahead, sorry for that).
The key to "consecutiveness" in terms of XPath axes (which are by definition not consecutive) is to check whether the closest node in the opposite direction that "first fulfills the condition" also is the one that "started" the series at hand:
a
b <- first node to fulfill the condition, starts series 1
b <- series 1
b <- series 1
a
b <- first node to fulfill the condition, starts series 2
b <- series 2
b <- series 2
a
In your case, a series consists of <span>
nodes that have the string x
in their @class
:
span[contains(concat(' ', @class, ' '),' x ')]
Note that I concat spaces to avoid false positives.
A <span>
that starts a series (i.e. one that "first fulfills the condition") can be defined as one that has an x
in its class and is not directly preceded by another <span>
that also has an x
:
not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])
We must check this condition in an <xsl:if>
to avoid that the template generates output for nodes that are in a series (i.e., the template will do actual work only for "starter nodes").
Now to the tricky part.
From each of these "starter nodes" we must select all following-sibling::span
nodes that have an x
in their class. Also include the current span
to account for series that only have one element. Okay, easy enough:
. | following-sibling::span[contains(concat(' ', @class, ' '),' x ')]
For each of these we now find out if their closest "starter node" is identical to the one that the template is working on (i.e. that started their series). This means:
they must be part of a series (i.e. they must follow a span
with an x
)
preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')]
now remove any span
whose starter node is not identical to the current series starter. That means we check any preceding-sibling span
(that has an x
) which itself is not directly preceded by a span
with an x
:
preceding-sibling::span[contains(concat(' ', @class, ' '),' x ')][
not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])
][1]
Then we use generate-id()
to check node identity. If the found node is identical to $starter
, then the current span is one that belongs to the consecutive series.
Putting it all together:
<xsl:template match="span[contains(concat(' ', @class, ' '),' x ')]">
<xsl:if test="not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])">
<xsl:variable name="starter" select="." />
<x>
<xsl:for-each select="
. | following-sibling::span[contains(concat(' ', @class, ' '),' x ')][
preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')]
and
generate-id($starter)
=
generate-id(
preceding-sibling::span[contains(concat(' ', @class, ' '),' x ')][
not(preceding-sibling::span[1][contains(concat(' ', @class, ' '),' x ')])
][1]
)
]
">
<xsl:value-of select="text()" />
</xsl:for-each>
</x>
</xsl:if>
</xsl:template>
And yes, I know it's not pretty. There is an <xsl:key>
based solution that is more efficient, Dimitre's answer shows it.
With your sample input, this output is generated:
1
<x>234</x>
5
<x>6</x>
7
<x>8</x>