3

This is almost going to sound like a joke, but I promise you this is real life. There is a site on the internet, one which you have all used, that does not believe in css classes. Everything is defined directly in the style tag on an element. It's horrifying.

My problem though is that it also makes the html extraordinarily difficult to parse. The structure that I've got to go on looks something like this:

<td>
    <a name="<random_string>"></a>
    <div style="generic-style, used by other elements">
        <div style="similarly generic style">{some_stuff}</div>
    </div>
    <a name="<random_string>"></a>
    ...
</td>

Basically, I've got these a tags that are forming the boundaries of the reviews, whos only defining information is the random string that is their name. I don't actually care about the anchor tags, but I would like to grab the reviews between them using xpath.

I've looked into sibling queries, but they don't seem to be well suited for alternating boundaries. I also looked into the Kayessian method of xpath queries, which (aside from having an awesome name) only seems well suited to grab a particular div, rather than all divs between the anchor tags.

Any thoughts on how I could grab the divs here?

Community
  • 1
  • 1
Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144

2 Answers2

1

I figured it out! It turns out that xpath will allow for relative attribute assertions. I am not sure if this behavior is desired, but it happens to work in this case! Here's the xpath:

//td/div[../a[@name]]

Nice and clean, the ../a[@name] basically just says:

Go up a level, and make sure on that level of the hierarchy there's an a element with a name attribute

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
  • 1
    1) Does this really solve your issue? - Any `div` with a sibling `a`, irrespective of order or div nesting? 2) Then it's the same as `//td/a[@name]/../div`. – JimmyB Aug 06 '15 at 16:25
  • @HannoBinder, it is not an ideal solution, but it does technically solve the problem for me. I'm not going to accept it because I think there are probably better solutions. This is... a solution that just happens to work, and it does appear that selector is equivalent. – Slater Victoroff Aug 06 '15 at 16:28
1

If //td/div[../a[@name]] works for you, then the following should also work :

//td[a/@name]/div

This way you don't need to go back and forth -or rather down and up-. For a more specific selector, you may want to try the following :

//td/div[preceding-sibling::*[1][self::a/@name]][following-sibling::*[1][self::a/@name]]

The XPath selects div element having all the following properties :

  • td/div : is child of <td> element

  • [preceding-sibling::*[1][self::a/@name]] : preceded directly by <a> element having attribute name

  • [following-sibling::*[1][self::a/@name]] : followed directly by <a> element having attribute name

har07
  • 88,338
  • 12
  • 84
  • 137