I am trying to scrape all of the paragraph text, including the hyperlink text, within a specific div class. If I use the following -
item['body']=response.xpath('//div[@class="example-class"]//p/text()').extract()
this results in all of the paragraph text to be extracted, but not the hyperlinks inside it. The results look like:
To find more information you can ,, and investigate further.
However, if I use //a instead of //p as follows -
item['body']=response.xpath('//div[@class="single-content"]//a/text()').extract()
this results in all of the hyperlinks being extracted but none of the paragraph text.
I understand why this is happening, but am not sure on how to properly extract both the paragraph text AND the hyperlinked text. Thank you very much.