XPath: Select following siblings until certain class

Question

I have the following html snippet:

<table>
    <tr>
        <td class="foo">a</td>
            <td class="bar">1</td>
            <td class="bar">2</td>
        <td class="foo">b</td>
            <td class="bar">3</td>
            <td class="bar">4</td>
            <td class="bar">5</td>
        <td class="foo">c</td>
            <td class="bar">6</td>
            <td class="bar">7</td>
    </tr>
</table>

I'm looking for a XPath 1.0 expression that starts at a .foo element and selects all following .bar elements before the next .foo element.
For example: I start at a and want to select only 1 and 2.
Or I start at b and want to select 3, 4 and 5.

Background: I have to find an XPath expression for this method (using Java and Selenium):

public List<WebElement> bar(WebElement foo) {
    return foo.findElements(By.xpath("./following-sibling::td[@class='bar']..."));
}

Is there a way to solve the problem?
The expression should work for all .foo elements without using any external variables.

Thanks for your help!

Update: There is apparently no solution for these special circumstances. But if you have fewer limitations, the provided expressions work perfectly.

What you want is what is matched by the second expression you gave? It means: give me the first `.bar` child and all its following `.bar` siblings until the first `.foo`. Is that it? — acdcjunior, Sep 13 '15 at 22:17
possible duplicate of [XPath : select all following siblings until another sibling](http://stackoverflow.com/questions/2161766/xpath-select-all-following-siblings-until-another-sibling) — nwellnhof, Sep 14 '15 at 00:09
@nwellnhof I believe it's not a duplicate because all solutions there rely on either empty text of the node, a certain id, XSLT context and functions, or XPath 2.0. None of this applies here. — GSerg, Sep 14 '15 at 00:16

Abel · Accepted Answer · 2015-09-14T07:31:00.613

4

Good question!

The following expression will give you 1..2, 3..5 or 6..7, depending on input X + 1, where X is the set you want (2 gives 1-2, 3 gives 3-.5 etc). In the example, I select the third set, hence it has [4]:

/table/tr[1]
  /td[not(@class = 'foo')]
  [
     generate-id(../td[@class='foo'][4]) 
     = generate-id(
         preceding-sibling::td[@class='foo'][1]
        /following-sibling::td[@class='foo'][1])
  ]

The beauty of this expression (imnsho) is that you can index by the given set (as opposed to index by relative position) and that is has only one place where you need to update the expression. If you want the sixth set, just type [7].

This expression works for any situation where you have siblings where you need the siblings between any two nodes of the same requirement (@class = 'foo'). I'll update with an explanation.

Replace the [4] in the expression with whatever set you need, plus 1. In oXygen, the above expression shows me the following selection:

Explanation

/table/tr[1]

Selects the first tr.

/td[not(@class = 'foo')]

Selects any td not foo

generate-id(../td[@class='foo'][4])

Gets the identity of the xth foo, in this case, this selects empty, and returns empty. In all other cases, it will return the identity of the next foo that we are interested in.

generate-id(
    preceding-sibling::td[@class='foo'][1]
    /following-sibling::td[@class='foo'][1])

Gets the identity of the first previous foo (counting backward from any non-foo element) and from there, the first following foo. In the case of node 7, this returns the identity of nothingness, resulting in true for our example case of [4]. In the case of node 3, this will result in c, which is not equal to nothingness, resulting in false.

If the example would have value [2], this last bit would return node b for nodes 1 and 2, which is equal to the identity of ../td[@class='foo'][2], returning true. For nodes 4 and 7 etc, this will return false.

Update, alternative #1

We can replace the generate-id function with a count-preceding-sibling function. Since the count of the siblings before the two foo nodes is different for each, this works as an alternative for generate-id.

By now it starts to grow just as wieldy as GSerg's answer, though:

/table/tr[1]
  /td[not(@class = 'foo')]
  [
     count(../td[@class='foo'][4]/preceding-sibling::*) 
     = count(
         preceding-sibling::td[@class='foo'][1]
        /following-sibling::td[@class='foo'][1]/preceding-sibling::*)
  ]

The same "indexing" method applies. Where I write [4] above, replace it with the n^th + 1 of the intersection position you are interested in.

edited Sep 14 '15 at 07:31

answered Sep 14 '15 at 00:20

Abel

56,041
24
146
247

1

I believe this will only work in an XSLT context. It will not work as a stand-alone XPath expression (e.g. for [`XmlDocument.SelectNodes`](https://msdn.microsoft.com/en-us/library/hcebdtae%28v=vs.110%29.aspx)). – GSerg Sep 14 '15 at 00:23
2

@GSerg, Selenium uses the [browser's XPath capabilities](http://acdcjunior.github.io/xpath-2-3-version-selenium-webdriver-support.html). Browsers support the generate-id function as far as I can remember. Though I admit, I haven't tried it with selenium. – Abel Sep 14 '15 at 00:41
Unfortunately the expression doesn't work with Selenium ("invalid selector"). Apparently Firefox doesn't know the `generate-id()` function. – Jonas Staudenmeir Sep 14 '15 at 01:31
1

@JonasStaudenmeir, looks like they have decided to make that function only available when you invoke the internal XSLT processor, [MDN shows 'XSLT specific'](https://developer.mozilla.org/en-US/docs/Web/XPath/Functions/generate-id) for that function, which is unfortunate. They have it, but decided to limit its scope... – Abel Sep 14 '15 at 01:50
1

@JonasStaudenmeir ...And since the `=`-operator does not compare nodes. Though you could compare content using `string()` instead of `generate-id()`, or another unique part (in your example, content is different, so it would work, but still). – Abel Sep 14 '15 at 01:54
Thanks, that's a great idea. But how would I generalize the expression to work without knowing the set index (see my code snippet)? The `current()` function doesn't work either in Firefox. – Jonas Staudenmeir Sep 14 '15 at 02:25
@JonasStaudenmeir, yes, [on that same list](https://developer.mozilla.org/en-US/docs/Web/XPath/Functions) it is listed as XSLT specific as well. Strangely, other XSLT specific functions, like `element-available`, _are_ available in XPath in Firefox... If you are _that limited in available functions_, I would go with the (slightly harder) solution provided by GSerg. Unless you could work out a common difference (as in your example, which is the string-value). – Abel Sep 14 '15 at 02:43
@JonasStaudenmeir, about generalizing the expression, this one already is. You mean GSerg's expression? Or did you mean, how to generalize the part that has to find some kind of node identity? There's another trick, that works for your special case (siblings), I'll update in a bit. – Abel Sep 14 '15 at 07:21
The problem is that I don't know how the `.foo` element was selected (or which index is has). I just get the `WebElement` and have to write a relative XPath expression that starts with `./` (see my code snippet). – Jonas Staudenmeir Sep 14 '15 at 12:29
1

@JonasStaudenmeir, ah, I didn't get that, and not from your OP, which starts with `//td`. You also specified _"The expression should work for all `.foo`"_. Sorry that I misunderstood. Will update in a bit. – Abel Sep 14 '15 at 12:38
2

@JonasStaudenmeir That means you need to be able to [get the context of the outer predicate from the inner predicate](http://stackoverflow.com/q/6595034/11683). Competent people say you [cannot do that with an XPath 1.0 expression](http://stackoverflow.com/a/6599393/11683). And by the way, if that is your situation, why can you not make several xpath queries against that context node? First one to get the next `foo`, if any, another two to get the two sets of `bar`, then subtract the two sets manually using referential equality? – GSerg Sep 14 '15 at 14:05
2

@JonasStaudenmeir, as GSerg says. Unless you manage to use Selenium with a browser that supports `current()` and `generate-id()`, this will be (virtuall?) impossible. The [solution by Lingamurthy solves the issue](http://stackoverflow.com/a/32556004/111575) with a similar technique as I demonstrated, but then for the context node set to one of the `foo` td's. This requires both these functions. – Abel Sep 14 '15 at 14:53

score 2 · Answer 2 · edited May 23 '17 at 11:58

2

~~So you want an intersection of two sets:~~

following-sibling::td[@class='bar'] that follow your starting td[@class='foo'] node
preceding-sibling::td[@class='bar'] that precede the next td[@class='foo'] node

Given the formula from the linked question, it is not difficult to get:

//td[1]/following-sibling::td[@class='bar'][count(. | (//td[1]/following-sibling::td[@class='foo'])[1]/preceding-sibling::td[@class='bar']) = count((//td[1]/following-sibling::td[@class='foo'])[1]/preceding-sibling::td[@class='bar'])]

~~However this will return an empty set for the last foo node because there is no next foo node to take precedings from.~~

So you want a difference of two sets:

following-sibling::td[@class='bar'] that follow your starting td[@class='foo'] node
following-sibling::td[@class='bar'] that follow the next td[@class='foo'] node

Given the formula from the linked question, it is not difficult to get:

//td[1]/following-sibling::td[@class='bar'][
    count(. | (//td[1]/following-sibling::td[@class='foo'])[1]/following-sibling::td[@class='bar'])
    !=
    count((//td[1]/following-sibling::td[@class='foo'])[1]/following-sibling::td[@class='bar'])
]

The only amendable bit is the starting point, //td[1] (three times).

Now this will properly return bar nodes even for the last foo node.

The above was written under impression that you need to have a single XPath query and nothing more. Now that it's clear you don't, you can easily solve your problem with more than one XPath query and some manual list filtering on referential equality, as I already mentioned in a comment.

In C# that would be:

XmlNode context = xmlDocument.SelectSingleNode("//td[8]");
XmlNode nextFoo = context.SelectSingleNode("(./following-sibling::td[@class='foo'])[1]");

IEnumerable<XmlNode> result = context.SelectNodes("./following-sibling::td[@class='bar']").Cast<XmlNode>();

if (nextFoo != null)
{
    // Intersect filters using referential equality by default
    result = result.Intersect(nextFoo.SelectNodes("./preceding-sibling::td[@class='bar']").Cast<XmlNode>());
}

I'm sure it's trivial to convert to Java.

edited May 23 '17 at 11:58

Community

1
1

answered Sep 14 '15 at 00:05

GSerg

76,472
17
159
346

1

Oh, sorry, I stumbled on the "only ...", and read "three times" as having three situations in the example, i.e., three sets. – Abel Sep 14 '15 at 00:43
Yep, works for me too :). It'd be worthwhile to break it down in pieces, though, it is hard to make out the expression... – Abel Sep 14 '15 at 00:44
@Abel I'm not sure where to put the line breaks, it looks confusing either way. The point of the expression is nicely shown in [that question](http://stackoverflow.com/q/7178471/11683): `$set1[count(. | $set2) != count($set2)]`, then you substitute `$set1` with `//td[1]/following-sibling::td[@class='bar']` (once) and `$set2` with `(//td[1]/following-sibling::td[@class='foo'])[1]/following-sibling::td[@class='bar']` (twice). – GSerg Sep 14 '15 at 00:49
Yes, I'm aware of the technique (though I really prefer XPath 2.0 and 3.0, where this is simply expressed with the `<<` operator), but I won't be the only reader here. Anyway, it's up to you ;) – Abel Sep 14 '15 at 00:55
@Abel yes I like XPath 2.0 too, but the OP is specifically looking for an XPath 1.0 solution. – GSerg Sep 14 '15 at 01:00
Thanks for your great answer! But how would I generalize the second and third occurrence of `//td[1]`? I can't replace them with `.` like the first one. Unfortunately my code has too work without knowing the specific path to the `.foo` element. – Jonas Staudenmeir Sep 14 '15 at 01:23
@JonasStaudenmeir But you have to know how to select the starting `foo` element in one form or another - and the xpath for selecting it you can place instead of the three occurrences of `//td[1]`. If it is okay for you to have one `//td[1]` (that you would replace in your xpath each time you execute the query), then it must be equally okay to have it three times? – GSerg Sep 14 '15 at 08:11
The problem is that I don't know how the `.foo` element was selected. I just get the `WebElement` and have to write a relative XPath expression that starts with `./` (see my code snippet). – Jonas Staudenmeir Sep 14 '15 at 12:26

score 2 · Answer 3 · answered Sep 14 '15 at 00:55

2

If the current node is one of the td[@class'foo'] elements you can use the below xpath to get the following td[@class='bar'] elements, which are preceding to next td of foo:

following-sibling::td[@class='bar'][generate-id(preceding-sibling::td[@class='foo'][1]) = generate-id(current())]

Here, you select only those td[@class='bar'] whose first preceding td[@class='foo'] is same as the current node you are iterating on(confirmed using generate-id()).

answered Sep 14 '15 at 00:55

Lingamurthy CS

5,412
2
13
21

Hey, you use the same generate-id approach I used. Independently? ;) – Abel Sep 14 '15 at 00:56
@Abel Yes, I did :) I think I made it more generic to be used by any `td[@class='foo']`. – Lingamurthy CS Sep 14 '15 at 00:59
2

All solutions here work with any starting point. The difference is the focus-approach (which allows you to use `current()` and makes the expression a lot simpler). But I think in Selenium, focus is typically the whole document. – Abel Sep 14 '15 at 01:03
@Abel No doubt, sir :) Just that I felt the OP wants to start with one of the `td` elements as he said _I'm looking for a XPath 1.0 expression that starts at a .foo_. – Lingamurthy CS Sep 14 '15 at 01:06
Unfortunately the expression doesn't work with Selenium ("invalid selector"). Apparently Firefox doesn't know the `generate-id()` function. – Jonas Staudenmeir Sep 14 '15 at 01:32

Rudolf Yurgenson · Answer 4 · 2015-09-14T09:13:50.033

0

Pretty straightforward (example for 'a' td) but not very optimal:

//td[
    @class='bar' and 
    preceding-sibling::td[@class='foo'][1][text() = 'a'] and 
    (
       not(following-sibling::td[@class='foo']) or 
       following-sibling::td[@class='foo'][1][preceding-sibling::td[@class='foo'][1][text() = 'a']]
    )
]

edited Sep 14 '15 at 09:13

answered Sep 14 '15 at 09:04

Rudolf Yurgenson

603
6
12

XPath: Select following siblings until certain class

4 Answers4

Explanation

Update, alternative #1