3

I'd like to select the following HTML in a document, based on the content of TARGET. I.e. if TARGET matches, select everything. However, I'm not sure where to go after: id('page')/x:div/span/a='TARGET' – How to use parent, child, and sibling expressions to get the containing div, the a preceding that div, and the two br tags following the div

<a></a>
<div>
    <br />
    <span>
        <a>TARGET</a>
        <a></a>
        <span>
            <span>
                <a></a>
            </span>
            <a></a>
            <span></span>
        </span>
        <span>
            <a></a>
        </span>
    </span>
</div>
<br />
<br />
Jeyekomon
  • 2,878
  • 2
  • 27
  • 37
urschrei
  • 25,123
  • 12
  • 43
  • 84

2 Answers2

3

Use a single XPath like:

"//*[
     (self::a and following-sibling::*[1][self::div and span/a='TRAGET']) or
     (self::div and span/a='TARGET') or
     (self::br and preceding-sibling::*[1][self::div and span/a='TARGET']) or
     (self::br and preceding-sibling::*[2][self::div and span/a='TARGET'])
    ]"

Do note that your document is not well formed due to unclosed br tags. Moreover, I didn't include any namespace, which you can add if necessary.

Emiliano Poggi
  • 24,390
  • 8
  • 55
  • 67
1

Probably, you should first find all divs (not sure about conditions should be met):

//div[span[a[text()="TARGET"]]][preceding-sibling::*[1][name()="a"]][following-sibling::*[1][name()="br"]]

after that - all related elements for each div:

   ./preceding-sibling::a[1]
   ./following-sibling::br[1]
   ./following-sibling::br[2]
taro
  • 5,772
  • 2
  • 30
  • 34