1

i want to select a string between </span> and <br/> for instance below

<span class="pl">制片国家/地区:</span>
中国大陆
<br/>

Of course i transfer all these chinese words into Unicode Format here i'd like to select "中国大陆" from this html file. i tried it with xpath and regex in this way:

sel.xpath("*").re(r'制片国家/地区:</span>\s*(.*)<br/>')

it should be "中国大陆",but i get a empty string.what should i do

CherryUnix
  • 15
  • 3
  • 1
    see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – arghtype Dec 18 '14 at 09:48

1 Answers1

1

You can use this to select your text:

//span[@class="pl"]/following-sibling::text()[1]
  • //span[@class="pl"] - Find a span with class pl (exactly) at any level of the document...
  • /following-sibling::text()[1] - ... and take the next text node following it.
Kobi
  • 135,331
  • 41
  • 252
  • 292