How to get next siblings until a specified element

Question

I am using Xpath to scrape a website (legitimately for once!!) thanks to the amazing powers of Visual Web Ripper.

One of the fields of content I need to be able to get is the P tag contents following a H3 tag. Now this is fine if I want the next one I can use the following code:

//DIV[@id='content']/H3[. = 'Prices']/following-sibling::P[1]

But how can I say I want the content of all P tags up-until the next H3?

possible duplicate of [XPath : select all following siblings until another sibling](http://stackoverflow.com/questions/2161766/xpath-select-all-following-siblings-until-another-sibling) — glmxndr, Apr 22 '11 at 07:49
Good question, +1. See my answer for a complete solution based on a general formula for node-set intersection. — Dimitre Novatchev, Apr 22 '11 at 13:02
@tigermain - i am trying to do the same thing. How do you use the xpath from vw-ripper in php? — Imran Omar Bukhsh, Apr 17 '12 at 14:02

score 1 · Answer 1 · answered Apr 22 '11 at 13:01

Use:

//div[@id='content']/h3[. = 'Prices']
  /following-sibling::p
    [count
      (. | 
       //div[@id='content']
              /h3[. = 'Prices']/following-sibling::h3/preceding-sibling::p
      )
     =
     count
      (
       //div[@id='content']
             /h3[. = 'Prices']/following-sibling::h3/preceding-sibling::p
       )
      ]

Here we use the Kayessian formula for intersection of two nodesets $ns1 and $ns2:

$ns1[count(.|$ns2) = count($ns2)]

score 0 · Answer 2 · answered Feb 25 '13 at 21:00

0

With Visual Web Ripper you can use the non-standard function SPAN which includes all siblings nodes until encountering the element specified.

Try :

//DIV[@id='content']/H3[. = 'Prices']/following-sibling::P[SPAN('H3')]

answered Feb 25 '13 at 21:00

Cartido

1

score -1 · Answer 3 · edited Apr 26 '11 at 20:08

-1

Thanks for your feedback and input guys but I found an event easier/quicker/tidier way of doing it (comments welcome)

//DIV[@id='content']/H3[. = 'Prices']/following-sibling::P[./preceding-sibling::H3[1][. = 'Prices']]

edited Apr 26 '11 at 20:08

John Saunders

160,644
26
247
397

answered Apr 22 '11 at 13:36

Anthony Main

6,039
12
64
89

@tigerman: this is not a reliable and general solution. Here it is applicable only because the `H3` element is uniquely identified by its string value. Were there more than one `H3` elements with the same string value, this solution might not select the desired nodes. At the same, the solution that I provided selects always the expected nodes. You may benefit from this solution if you wish to learn. – Dimitre Novatchev Apr 23 '11 at 02:35
@tigerman: Also note that XPath (and XML) is case-sensitive and in your question you are mixing cases (`p` and `P`) which makes the statements contained in the question false. It would be good if you correct your question. I would recommend that you pay more attention to learning XPath and XML. – Dimitre Novatchev Apr 23 '11 at 02:39
I appreciate your concerns, in my case this is not an issue, I am actually using the text values of the H3 element as unique identifiers – Anthony Main Apr 23 '11 at 09:42
@tigerman: In all such cases you must list your assumptions in the question itself. – Dimitre Novatchev Apr 23 '11 at 15:42

How to get next siblings until a specified element

3 Answers3