-1

I'm trying to get the names of the users and the content of the comments that exist on this page:

User and text that I need to extract: Image

When I test the extraction with the chrome plugin Xpath helper, I am getting the user names with the statement:

//*[@id="livefyre"]/div/div/div/div/article/div/header/a/span

and the comments, I get them with:

//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p

When I do the test in the scrapy console, with the query:

response.xpath(//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p).extract()

I get a [];

I've also tried with:

response.xpath (//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p.text()).extract()

The same thing happens with my code.

Verifying the code of the page, I see that all those comments do not exist in the html code.

When I inspect the page, for example, I see the comment text: Image

But when, I check the html code of the page I do not see anything : Image

Where am I making a mistake?

Thanks for help.

Rishabh Agarwal
  • 1,988
  • 1
  • 16
  • 33
  • 1
    Possible duplicate of [Can scrapy be used to scrape dynamic content from websites that are using AJAX?](https://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax) – ThunderMind Jan 02 '19 at 18:03

1 Answers1

2

As you stated, there isn't any comment in the code of page, that mean website is being rendered through javascript, There are two ways you can scrape these kind of websites

First,

use scrapy-splash to render javascript

second,

find the api/network call that brings the comments, mock that request in scrapy to get your data.

ThunderMind
  • 789
  • 5
  • 14
  • Hi ThunderMind I'm going to try what you mention about scrapy-splash. Regarding finding the call to the api / network that brings the comments. That how could I see it? Thank's ! – Diego Carabajal Jan 02 '19 at 19:31
  • you can find all the `network calls` in your browser's `developer tools`., there should be multiple tabs, including `Network` in that section you will get all the `api calls`, now check for every `XHR` call that is placed and find the api through which website gets the data. – ThunderMind Jan 02 '19 at 20:06
  • Ok ThunderMind. Thank's again! – Diego Carabajal Jan 03 '19 at 01:36