//p[not(ancestor::*[3])]
//table[ancestor::*[1][self::p] or ancestor::*[2][self::p]]
tr/td//a[ancestor::*[1`][self::td] or ancestor::*[2][self::td]]

- 21
- 1
-
Perhaps it means you need to find an alternative way to locate that paragraph since the xpath is so convoluted. – Brian Donovan Dec 31 '10 at 16:37
-
@Brian Donovan - The expression selects `a` tags. – Oded Dec 31 '10 at 16:39
-
@Oded: Right you are, didn't scroll all the way over :) Except that it's not well-formed as far as I can tell. This part looks like it needs a `//` before the `tr`: `...[self::p]]tr/td//a...`. – Brian Donovan Dec 31 '10 at 16:45
-
@Brian Donovan - You are quite right. – Oded Dec 31 '10 at 16:46
-
Good question, +1. See my answer for an explanation. :) – Dimitre Novatchev Dec 31 '10 at 18:09
3 Answers
// # from the root node, look at all descendants
p[ # select nodes of type <p>, who have…
not(ancestor::*[3]) # …no ancestor 3 levels up
] #
// # from these nodes, select descendants
table[ # of type <table>, who have…
ancestor::*[1][self::p] # …a <p> as their direct ancestor
or # or
ancestor::*[2][self::p] # …a <p> as their second ancestor
] #
# syntax error, this should be a location step
tr # …select all nodes of type <tr>
/ # from their children…
td # …select all nodes of type <td>
// # from their descendants…
a[ # …select all nodes of type <a>, who have
ancestor::*[1][self::td] # …a <td> as their direct ancestor
or # or
ancestor::*[2][self::td] # …a <td> as their second ancestor
]
Or, expressed in HTML:
<html>
<body>
<p>
<table>
<tr>
<td>
<a title="These would be selected." />
</td>
</tr>
</table>
</p>
</body>
</html>
The whole XPath does not make too much sense anyway. It goes without saying that <p><table>
is invalid HTML.

- 332,285
- 67
- 532
- 628
Lets break this down:
//p[not(ancestor::*[3])]
Selects all p
tags who do not have a 3rd ancestor.
In those:
//table[ancestor::*[1][self::p] or ancestor::*[2][self::p]]
It selects all table
tags whose first or second ancestor is a p
tag.
Then:
tr/td//a[ancestor::*[1`][self::td] or ancestor::*[2][self::td]]
This isn't entirely correct (there should be a /
at the start). However, it goes down the tr/td//
s to select all a
tags whose first or second ancestor is a td
tag.
All and all, it is very convoluted and could probably be a lot easier to achieve with some id
attributes defined in the relevant places.

- 489,969
- 99
- 883
- 1,009
-
1Having written a fair number of xpaths for the purpose of screen scraping, I'd say this one is horribly broken. Not only was it not well-formed, but it's incredibly fragile. It'd break at the slightest change to the page. – Brian Donovan Dec 31 '10 at 16:52
The XPath expression you specified:
//p[not(ancestor::*[3])]
//table[ancestor::*[1][self::p] or ancestor::*[2][self::p]]
tr/td//a
isn't syntactically legal -- it is missing a /
before the tr
.
The corrected XPath expression:
//p[not(ancestor::*[3])]
//table[ancestor::*[1][self::p] or ancestor::*[2][self::p]]
/tr/td//a
was provided as answer to this question.
As explained in the linked (above) answer, the meaning is:
This selects all a elements whose parent or grand-parent is td, whose parent is a tr, whose parent is a table, whose parent or grandparent is a p
that has less than 3 ancesstor - elements
The OP wanted a way to get the a
elements that can be located under a p
buried not deeper than 3 levels beneath the root of the document, then under a table/tr/td
where the table
is buried at a level not greater than 3 from the p
.
Certainly, wanting to select such nodes may not seem too-meaningful, but we are in no position to judge anyone's needs and requirements.
The amazing fact is that XPath is so powerful to satisfy even such requirements.

- 1
- 1

- 240,661
- 26
- 293
- 431
-
A minor: Isn't `ancestor::*[1][self::p]` equal to `parent::p`? It looks like you'd used that to make the expression readable... – Jan 01 '11 at 21:44
-
@Alejandro: Exactly, this is for the user to see the pattern (`ancesstor::*[N][self::p]`), so I think uniformness is helpful here. BTW, Happy New Year! :) – Dimitre Novatchev Jan 01 '11 at 22:11
-