3

I am trying scrape this page, when I do scrapy shell "https://redsea.com/en/apple-iphone-x-64gb-silver.html", it downloads the html page and I can view the downloaded html with view(response) in the browser: enter image description here enter image description here

But when I try to get any data -product name, for example- by response.css('.page-title') it gives me empty response: enter image description here

Scraping a website that fetches data using rest-api using scrapy just downloads the website structure html without data and it makes sense that scrapy cannot get that data. But in this case scrapy downloads the html file with data but not able to read it using css or xpaths. I don't understand this behavior.

Javed
  • 5,904
  • 4
  • 46
  • 71
  • 1
    We are not getting the values in the page source, means the data is loading dynamically.So you have to use packages like splash, selenium to fetch dynamically loading data. – Arun Augustine Nov 08 '17 at 10:15

1 Answers1

2

But in this case scrapy downloads the html file with data but not able to read it using css or xpaths.

It doesn’t, when you open the HTML in a browser, the JavaScript loads the content into the DOM, either from a separate URL or from hard-coded values in JavaScript, which is why you can see the content using view(response).

If you inspect the actual HTML content (e.g. open the page sources in your browser, Ctrl+U in Firefox), you’ll see that the data you want is either not there at all or inside an <script/> element.

Open the Network tab of the developer tools of your web browser, force-reload the page (Ctrl+Shift+R in Firefox) and watch the additional requests that are performed on the background, one of them is likely to have the desired data.

You can then have Scrapy perform a requests similar to that request made in the background.

Gallaecio
  • 3,620
  • 2
  • 25
  • 64