How to get all elements between two nodes with XPATH?

Question

I have HTML code like this:

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <meta name="viewport" content="width=device-width">
        <title>test</title>
    </head>
    <body>
        <h3><a href="#" name='title1'>Title</a></h3>
        <div>para1</div>
        <div>para2</div>
        <div>para3</div>
        <h3><a href="#" name='title2'>Title</a></h3>
        <div>para4</div>
        <div>para5</div>
    </body>
</html>

What I want is:

<div>para1</div>
<div>para2</div>
<div>para3</div>

So I want to get the first part in this html. I need to ignore the second part.

for now I just work out this way:

#!/usr/bin/env python
# encoding: utf-8

import unittest

from lxml import etree

class SearchPara(unittest.TestCase):

    def setUp(self):
        with open('test.html') as f:
            self.html = f.read()

    def test_parse_html(self):
        paras = ''
        page = etree.HTML(self.html)
        a_ele = page.xpath("//h3/a[@name='title1']/..")

        if a_ele is None or len(a_ele) < 1:
            return paras

        para = a_ele[0].xpath('following-sibling::*[1][name(.) != "h3"]')
        while para is not None and len(para) > 0:
            print para
            paras += etree.tostring(para[0])
            para = para[0].xpath('following-sibling::*[1][name(.) != "h3"]')

        print paras


    def tearDown(self):
      pass

if __name__ == "__main__":
    unittest.main()

As you can see, this is a little bit complicated, what I want to know if I have beeter way to do this?

@mins you puzzled me, `p[a]` means `All P with an A child`, but the `
` are not children of `` — roger, Apr 11 '15 at 10:01
@mins I want to get the first`
para1
para2
para3
` and the second `
para4
para5
` as my result — roger, Apr 11 '15 at 10:03
Like you edited it, now it seems you want to select all
regardless of other criteria. Maybe you should clarify what you want with a better example. — mins, Apr 11 '15 at 10:10
@mins yes, exactly. In fact, I am doing this because I want to get all paragraphs associate with a title. Can I do this? — roger, Apr 11 '15 at 10:16
You want to get all siblings from a starting point, until some sibling is found: See [XPath : select all following siblings until another sibling](http://stackoverflow.com/questions/2161766/xpath-select-all-following-siblings-until-another-sibling) — mins, Apr 11 '15 at 10:21
roger, your question is not clear. It is unclear what the rules are for selecting nodes in your example. Please edit your question and show _another_ sample. A [**good sample**](http://stackoverflow.com/help/mcve) contains all the complexity present in your actual data. Show us a sample where some of the `div` elements should _not_ be selected. Also, samples must be **complete** documents. — Mathias Müller, Apr 11 '15 at 22:38
Please show us what you tried so far. Furthermore, your assertion does not apply. The `
`with para1 is **not** followed by an `` until the one preceding the `
`. And since there is no further ``, nothing matches. — Markus W Mahlberg, Apr 12 '15 at 14:34
@MarkusWMahlberg as you insists, I have give you all codes, you can test in your machine. — roger, Apr 13 '15 at 04:29
@MathiasMüller I have change my question completely, maybe this is clearly for now. — roger, Apr 13 '15 at 04:30
@roger: I do not insist on my purpose. You might want to read [How do I ask a good question](http://stackoverflow.com/help/how-to-ask), which enhances the probability for getting a useful answer _drastically_. You might find [ESR](https://en.m.wikipedia.org/wiki/Eric_S._Raymond)'s excellent essay [How To Ask Questions The Smart Way](http://catb.org/~esr/faqs/smart-questions.html) helpful, too. — Markus W Mahlberg, Apr 13 '15 at 10:18

score 1 · Accepted Answer · answered Apr 13 '15 at 05:54

As far as I know, there is no general way to select elements between 2 elements using XPath 1.0.

The same output still can be achieved if we can define the assertion differently. For example, by selecting <div>s having nearest preceding sibling <a> value equals "Title: Part I" :

//div[preceding-sibling::a[1][. = 'Title: Part I']]

and selecting the next <div>s group only require changing the <a> criteria :

//div[preceding-sibling::a[1][. = 'Title: Part II']]

The test method to see above xpath in action :

def test_parse_html(self):
    page = etree.HTML(self.html)
    paras = ''
    para = page.xpath("//div[preceding-sibling::a[1][. = 'Title: Part I']]")
    for p in para:
        paras += etree.tostring(p)

    print paras

Side note. xpath for populating a_ele in your code can be simplified this way :

a_ele = page.xpath("//a[h3 = 'Title: Part I']")

or even further, since the only text element within the <a> is "Title: Part I" :

a_ele = page.xpath("//a[. = 'Title: Part I']")

I am so so so so sorry, I need to bother you again, my situation is a little bit different now, I tried to use your tips to simplify my program, can you give me a tip again? — roger, Apr 13 '15 at 06:40
Same approach with slight change to the xpath expression : `//div[preceding-sibling::h3[1][a/@name = 'title1']]` ? — har07, Apr 13 '15 at 06:58

How to get all elements between two nodes with XPATH?

`. And since there is no further ``, nothing matches.

1 Answers1