3

I need to scrape a url which has checkboxes in it. I wanna click some of the checkboxes and scrape and I wanna scrape again with someother checkboxes clicked. For instance;

I wanna click new and then scrape and then I wanna scrape the same url with Used and Very Good clicked.

Is there a way to do this without making more than 1 request which is done for getting the url.

I guess html changes when you click one of the boxes since the listing will change when you refine the search. Any thoughts? Any suggestions?

Best,

Can

Can Gokalp
  • 115
  • 1
  • 14

2 Answers2

3

When a page changes it most likely makes a new AJAX request to retrieve some data from the server then reloads bits of the page with javascript.

To replicate that in scrapy - you need to find out the requests being made via network tools in your browser and replicate them in your scrapy spider.
See related issue: Can scrapy be used to scrape dynamic content from websites that are using AJAX?

Community
  • 1
  • 1
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
0

You are wrong.

Scrapy cannot manipulate real browser-like behavior.

From the image you linked, I saw you are scraping Amazon, so open that link in browser, and click on checkbox, you will notice the URL in browser will also change according to new filter set.

And then put that URL in scrapy code and do your scraping.

IF YOU WANT TO MANIPULATE REAL BROWSER-LIKE BEHAVIOR use Python Selenium or PhantomJS or CasperJS.

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
  • I can get the url and scrape that but in that case I'm gonna double my requests. I'm scraping many urls and I need this for each one. If I change the url twice to scrape the same url for different checked boxes then I'm gonna double/triple the number of requests I make and double/triple the time it takes. That's why I need a solution which won't make a new request to put check in the boxes – Can Gokalp Feb 17 '17 at 17:10
  • @CanGokalp YOU will have to make multiple requests anyway ... because when you click on checkbox, Amazon also sends an AJAX to their server and receives updated response ... – Umair Ayub Feb 17 '17 at 17:41