Is there a scrapy following-sibling count?

Question

I'm trying to scrape the title of the following html code:

<FONT COLOR=#5FA505><B>Claim:</B></FONT> &nbsp; Coed makes unintentionally risqu&eacute; remark about professor's "little quizzies."
<BR><BR>
<CENTER><IMG SRC="/images/content-divider.gif"></CENTER>

I'm using this code:

def parse_article(self, response):
             for href in response.xpath('//font[b = "Claim:"]/following-sibling::text()'):
                        print href.extract()

and I succesfully pull the correct Claim: value that I want from the aforementioned html code but it also, (among others with similar structure in the same page) pulls the below html. I am defining my xpath() to just pull in the font tag named Claim: so why is it pulling in the below Origins as well? And how can I fix it? I tried seeing if I could get only the next following-sibling instead of all of them, but that didn't work

<FONT COLOR=#5FA505 FACE=""><B>Origins:</B></FONT> &nbsp; Print references to the "little quizzies" tale date to 1962, but the tale itself has been around since the early 1950s. It continues to surface among college students to this day. Similar to a number of other college legends

@JohnDene my output changes, but it's just a bunch of empty space with an infrequent `,` every once in a while — Rafa, Oct 06 '15 at 16:51
I think that is bcoz you are using a for loop . If I get it correct you want to extract only one value? — John Dene, Oct 06 '15 at 17:15

score 0 · Answer 1 · edited May 23 '17 at 12:06

0

I think your xpath is missing text() qualifier (explained here). It should be:

'//font/[b/text()="Claim:"]/following-sibling::text()'

edited May 23 '17 at 12:06

Community

1
1

answered Oct 06 '15 at 16:54

Łukasz

35,061
4
33
33

Still gives me the same output. Pulling in the `Origins` as well. – Rafa Oct 06 '15 at 16:59

score 0 · Answer 2 · answered Oct 06 '15 at 20:36

The following-sibling axis returns all siblings following an element. If you only want the first sibling, try the XPath expression:

//font[b = "Claim:"]/following-sibling::text()[1]

Or, depending on your exact use case:

(//font[b = "Claim:"]/following-sibling::text())[1]

Is there a scrapy following-sibling count?

2 Answers2