XPath - Select first group of siblings between two nodes

Question

I ran into a little problem when using XPath to query some HTML files in C#.

Ok, first here's a sample HTML:

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr class="theClass">C</tr>
        <tr class="theClass">D</tr>
        <tr>6</tr>
        <tr>7</tr>
        <tr>8</tr>
        <tr>9</tr>
        <tr>10</tr>
        <tr>11</tr>
        <tr>12</tr>
        <tr>13</tr>
        <tr>14</tr>
        <tr>15</tr>
        <tr class="theClass">E</tr>
        <tr class="theClass">F</tr>
        <tr>16</tr>
        <tr>17</tr>
        <tr>18</tr>
        <tr>19</tr>
        <tr>20</tr>
        <tr>21</tr>
        <tr>22</tr>
    </tbody>
</table>

Now, what I'm trying to do is to get only those elements that are between the B and C nodes (1,2,3,4,5,).

Here's what I tried so far:

using System;
using System.Xml.XPath;

namespace Test
{
    class Test
    {
        static void Main(string[] args)
        {
            XPathDocument doc = new XPathDocument("Test.xml");
            XPathNavigator nav = doc.CreateNavigator();

            Console.WriteLine(nav.Select("//table[@id='theTable']/tbody/tr[preceding-sibling::tr[@class='theClass'] and following-sibling::tr[@class='theClass']]").Count);
            Console.WriteLine(nav.Select("//table[@id='theTable']/tbody/tr[preceding-sibling::tr[@class='theClass'][2] and following-sibling::tr[@class='theClass'][4]]").Count);

            Console.ReadKey(true);
        }
    }
}

This code, ran over the above HTML, outputs 19 and 5. So only the second XPath expression works but that only because it searches for elements that have two elements with class=theClass before them and 4 after them.

My problem starts now. I want to write a single expression that will return only the first group of elements that come after a <td class="theClass"></td> tag, no matter how many more groups are following it.

If I run my code over this HTML

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr>6</tr>
    </tbody>
</table>

it will output 0 and 0.

So it's no good.

Does anybody have any ideas?

Thank you!

@ChuckSavage I expect, for the first HTML, to have the elements 1,2,3,4,5 returned and for the second HTML, the elemenets 1,2,3,4,5,6. — Leif Lazar, May 30 '12 at 09:10

Dimitre Novatchev · Accepted Answer · 2012-05-30T12:03:45.307

Now, what I'm trying to do is to get only those elements that are between the B and C nodes

Use this single XPath expression:

   /*/*/tr[.='B']
           /following-sibling::*
             [count(.|/*/*/tr[. ='C']/preceding-sibling::*)
             =
              count(/*/*/tr[. ='C']/preceding-sibling::*)
             ]

Here is an XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "/*/*/tr[.='B']
           /following-sibling::*
             [count(.|/*/*/tr[. ='C']/preceding-sibling::*)
             =
              count(/*/*/tr[. ='C']/preceding-sibling::*)
             ]
  "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the first provided XML document:

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr class="theClass">C</tr>
        <tr class="theClass">D</tr>
        <tr>6</tr>
        <tr>7</tr>
        <tr>8</tr>
        <tr>9</tr>
        <tr>10</tr>
        <tr>11</tr>
        <tr>12</tr>
        <tr>13</tr>
        <tr>14</tr>
        <tr>15</tr>
        <tr class="theClass">E</tr>
        <tr class="theClass">F</tr>
        <tr>16</tr>
        <tr>17</tr>
        <tr>18</tr>
        <tr>19</tr>
        <tr>20</tr>
        <tr>21</tr>
        <tr>22</tr>
    </tbody>
</table>

the XPath expression is evaluated and the selected nodes are copied to the output:

<tr>1</tr>
<tr>2</tr>
<tr>3</tr>
<tr>4</tr>
<tr>5</tr>

Explanation:

Here we simply use the Kayessian formula for node-set intersection:

$ns1[count(.|$ns2) = count($ns2)]

where we substituted $ns1 with:

 /*/*/tr[.='B']
               /following-sibling::*

and we substituted $ns2 with:

/*/*/tr[. ='C']/preceding-sibling::*

The second problem:

My problem starts now. I want to write a single expression that will return only the first group of elements that come after a <td class="theClass"></td> tag, no matter how many more groups are following it.

Again a single XPath expression selecting those elements exists:

   /*/*/tr[@class='theClass'
         and
           following-sibling::*[1][self::tr[not(@*)] ]
           ][1]
             /following-sibling::tr
               [not(@*)
              and
                count(preceding-sibling::tr
                       [@class='theClass'
                      and
                        following-sibling::*[1][self::tr[not(@*)] ]
                       ]
                     )
                = 1
               ]

Explanation:

This selects all following siblings tr elements (that satisfy a number of conditions) of the first */*/tr element whose class attribute has string value "theClass" and whose first following element sibling is a tr that has no attributes.

The conditions that these selected tr elements also satisfy are two: 1) they don't have any attributes; and 2) they have only one preceding sibling tr element, whose class attribute has string value "theClass".

And here is the XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "/*/*/tr[@class='theClass'
         and
           following-sibling::*[1][self::tr[not(@*)] ]
           ][1]
             /following-sibling::tr
               [not(@*)
              and
                count(preceding-sibling::tr
                       [@class='theClass'
                      and
                        following-sibling::*[1][self::tr[not(@*)] ]
                       ]
                     )
                = 1
               ]
  "/>
 </xsl:template>
</xsl:stylesheet>

when applied on the second provided XML document:

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr>6</tr>
    </tbody>
</table>

again the wanted and correctly selected elements are output:

<tr>1</tr>
<tr>2</tr>
<tr>3</tr>
<tr>4</tr>
<tr>5</tr>
<tr>6</tr>

Thank you very much, it works. Could you also leave an explanation of the final expression? I'm not sure I understood it. Thank you! — Leif Lazar, May 30 '12 at 09:16
@LeifLazar: You are welcome. I edited the answer and added explanations for both expressions. — Dimitre Novatchev, May 30 '12 at 12:04

score 1 · Answer 2 · answered May 30 '12 at 01:02

If you don't have to use XPath some LINQ may be easier to get right and will be more readable.

In your case combination of Skip and TakeWhile similar to following pseudo-code could work:

nav.Select("//table[@id='theTable']/tbody/tr") // whatever to get list of all TR
   .Skip("theClass is B") // some condition to skip up to first node
   .TakeWhile("theClass is C"); // some condition to take upto second node.

XPath - Select first group of siblings between two nodes

2 Answers2

Linked