Scrapy: Extracting both text AND hyperlink text using xpath

Asked Sep 27 '18 at 21:17

Active Sep 27 '18 at 21:17

Viewed 192 times

I am trying to scrape all of the paragraph text, including the hyperlink text, within a specific div class. If I use the following -

item['body']=response.xpath('//div[@class="example-class"]//p/text()').extract()

this results in all of the paragraph text to be extracted, but not the hyperlinks inside it. The results look like:

To find more information you can ,, and investigate further.

However, if I use //a instead of //p as follows -

item['body']=response.xpath('//div[@class="single-content"]//a/text()').extract()

this results in all of the hyperlinks being extracted but none of the paragraph text.

I understand why this is happening, but am not sure on how to properly extract both the paragraph text AND the hyperlinked text. Thank you very much.

asked Sep 27 '18 at 21:17

Sean

Possible duplicate of [Two conditions using OR in XPATH](https://stackoverflow.com/questions/12562597/two-conditions-using-or-in-xpath) – LMC Sep 27 '18 at 22:31
Issue solved. The links themselves are nodes that I needed to decend by changing //p/text -> //p//text. Ref - https://stackoverflow.com/questions/51354279/xpath-taking-text-with-hyperlinks-python – Sean Sep 27 '18 at 22:58

0 Answers0