Xpath Get text after first html tag

Question

There are next block

<div class="text">
  <h1>head1</h1>
    Text1 <br/><br/> text12  <br/><br/> text 13
  <h1>head11</h1>
    Text11
  <h3>head3</h3>
    Text2
</div>

How to get text after first H1 with ignore <br/><br/> as

Text1 
text12
text 13

I use Grab Python page = g.doc.select('//div[@class="text"]/h3[1]/following-sibling::text()]') Result is

Text1
text12
text 13
Text11
Text2

Daniel Haley · Accepted Answer · 2017-08-04T17:35:20.953

1

You could try selecting the text() that only has one preceding h1 sibling...

//div[@class='text']/text()[count(preceding-sibling::h1)=1]

Another alternative is to try using the Kayessian method...

//div[@class='text']/h1[1]/following-sibling::text()[count(.|//div[@class='text']/h1[1+1]/preceding-sibling::text()) = count(//div[@class='text']/h1[1+1]/preceding-sibling::text())]

Here's a better example and explanation of the Kayessian method.

edited Aug 04 '17 at 17:35

answered Aug 04 '17 at 17:20

Daniel Haley

51,389
6
69
95

If some change xml

Headerh1
Text1
after header1
Headerh3.1
Text2
after header3.1
Headerh3.2
Text3
after header3.2
Headerh3.3
Text4
after header3.3
How change //div[@class='text']/text()[count(preceding-sibling::h1)=1] for text after H1, Text1
after header1? – dMazay Aug 05 '17 at 12:34

Xpath Get text after first html tag

1 Answers1

Headerh1

Headerh3.1

Headerh3.2

Headerh3.3