18

I've already seen this question about scraping ajax, but python isn't mentioned there. I considered using scrapy, i believe they have some docs on that subject, but as you can see the website is down. So i don't know what to do. I want to do the following:

I only have one url, example.com you go from page to page by clicking submit, the url doesn't change since they're using ajax to display the content. I want to scrape the content of each page, how to do it?

Lets say that i want to scrape only the numbers, is there anything other than scrapy that would do it? If not, would you give me a snippet on how to do it, just because their website is down so i can't reach the docs.

Community
  • 1
  • 1
Lynob
  • 5,059
  • 15
  • 64
  • 114

2 Answers2

36

First of all, scrapy docs are available at https://scrapy.readthedocs.org/en/latest/.

Speaking about handling ajax while web scraping. Basically, the idea is rather simple:

  • open browser developer tools, network tab
  • go to the target site
  • click submit button and see what XHR request is going to the server
  • simulate this XHR request in your spider

Also see:

Hope that helps.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 1
    i was referring to this url `blog.scrapy.org/scraping-ajax-sites-with-scrapy‎` which is no longer available, thanks for reminding me of readthedocs.com – Lynob May 06 '13 at 18:06
  • 1
    Got it. If you have problems with the spider implementation, consider posting another question with what url are you trying to crawl, what button to click etc. Happy scraping! – alecxe May 06 '13 at 18:23
  • 1
    @Lynob here's the URL you're talking about: https://web.archive.org/web/20130525095330/http://blog.scrapy.org/scraping-ajax-sites-with-scrapy – 2upmedia Dec 17 '15 at 20:57
4

I found the answer very useful but I would like to make it more simple.

response = requests.post(request_url, data=payload, headers=request_headers)

request.post takes three parameters url, data and headers. Values for these three attributes can be found in the XHR request.

Copy the whole request header and form data to load into the above variables and you are good to go

Abhishek Gurjar
  • 7,426
  • 10
  • 37
  • 45
Malak
  • 41
  • 1
  • 2
    What if I can't find the XHR request? It's not visible in Chrome or FoxFox webtools. However, there is an AJAXSetup.js file. Is there any other way to find that information? – whieronymus Jun 05 '17 at 21:21