1

I would like to access all of the items in a given category inside amazon, but it seems that the category pages are generated via search. Bumping the page search parameter in the URL will only take you to the 100th page. Is there any way to get past that? Here's a sample url for books

nyedidikeke
  • 6,899
  • 7
  • 44
  • 59
Andres
  • 2,880
  • 4
  • 32
  • 38

1 Answers1

1

The content is loaded dynamically using ajax XHR call.

Long story short:

  • open browser dev tools
  • open network tab
  • click on the page link on amazon
  • see XHR request is going to http://www.amazon.com/mn/search/ajax/ref=sr_pg_3... - this is what you should call in your Scrapy spider (returns JSON)

So, basically, you should just call this XHR request 100 times (or find out if you can get them all in one).

Useful links:

Notes:

Hope that helps.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • thanks for the tip, that was helpful. Taking a look at those two links you shared. As for the xhr request, it looks pretty nasty, as the JSON results actually contain the page's HTML. I try bumping up the two parameters page=101 and ref=sr_pg_100, but results are then empty. Any idea what the rest of parameters are for? – Andres Apr 24 '13 at 23:55
  • It's smth specific to this ajax dataprovider, you probably need just `page`, and may be `sort`. I've added some notes to the answer, see if it helps. – alecxe Apr 25 '13 at 08:26
  • haven't looked at it in a while. Do you have anything? – Andres Oct 28 '14 at 20:49