I'm trying to scrape the title of the following html code:
<FONT COLOR=#5FA505><B>Claim:</B></FONT> Coed makes unintentionally risqué remark about professor's "little quizzies."
<BR><BR>
<CENTER><IMG SRC="/images/content-divider.gif"></CENTER>
I'm using this code:
def parse_article(self, response):
for href in response.xpath('//font[b = "Claim:"]/following-sibling::text()'):
print href.extract()
and I succesfully pull the correct Claim:
value that I want from the aforementioned html code but it also, (among others with similar structure in the same page) pulls the below html. I am defining my xpath()
to just pull in the font
tag named Claim:
so why is it pulling in the below Origins
as well? And how can I fix it? I tried seeing if I could get only the next following-sibling
instead of all of them, but that didn't work
<FONT COLOR=#5FA505 FACE=""><B>Origins:</B></FONT> Print references to the "little quizzies" tale date to 1962, but the tale itself has been around since the early 1950s. It continues to surface among college students to this day. Similar to a number of other college legends