0

I am trying to scrape all of the paragraph text, including the hyperlink text, within a specific div class. If I use the following -

item['body']=response.xpath('//div[@class="example-class"]//p/text()').extract()

this results in all of the paragraph text to be extracted, but not the hyperlinks inside it. The results look like:

To find more information you can ,, and investigate further.

However, if I use //a instead of //p as follows -

item['body']=response.xpath('//div[@class="single-content"]//a/text()').extract()

this results in all of the hyperlinks being extracted but none of the paragraph text.

I understand why this is happening, but am not sure on how to properly extract both the paragraph text AND the hyperlinked text. Thank you very much.

Sean
  • 515
  • 7
  • 17
  • Possible duplicate of [Two conditions using OR in XPATH](https://stackoverflow.com/questions/12562597/two-conditions-using-or-in-xpath) – LMC Sep 27 '18 at 22:31
  • Issue solved. The links themselves are nodes that I needed to decend by changing //p/text -> //p//text. Ref - https://stackoverflow.com/questions/51354279/xpath-taking-text-with-hyperlinks-python – Sean Sep 27 '18 at 22:58

0 Answers0