2

There are next block

<div class="text">
  <h1>head1</h1>
    Text1 <br/><br/> text12  <br/><br/> text 13
  <h1>head11</h1>
    Text11
  <h3>head3</h3>
    Text2
</div>

How to get text after first H1 with ignore <br/><br/> as

Text1 
text12
text 13

I use Grab Python page = g.doc.select('//div[@class="text"]/h3[1]/following-sibling::text()]') Result is

Text1
text12
text 13
Text11
Text2
dMazay
  • 53
  • 1
  • 6

1 Answers1

1

You could try selecting the text() that only has one preceding h1 sibling...

//div[@class='text']/text()[count(preceding-sibling::h1)=1]

Another alternative is to try using the Kayessian method...

//div[@class='text']/h1[1]/following-sibling::text()[count(.|//div[@class='text']/h1[1+1]/preceding-sibling::text()) = count(//div[@class='text']/h1[1+1]/preceding-sibling::text())]

Here's a better example and explanation of the Kayessian method.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • If some change xml

    Headerh1

    Text1
    after header1

    Headerh3.1

    Text2
    after header3.1

    Headerh3.2

    Text3
    after header3.2

    Headerh3.3

    Text4
    after header3.3
    How change //div[@class='text']/text()[count(preceding-sibling::h1)=1] for text after H1, Text1
    after header1?
    – dMazay Aug 05 '17 at 12:34