3

I've been trying to scrape comments from a bunch of public instagram posts by writing a Python crawler (Scrapy). I've looked at all the available material, especially this, but so far I've had no luck. It's worth mentioning that I've also tried making the hash from

rhx_gis + ":" + csrf_token + ":" + user agent + ":" + variables

as mentioned here, but also without luck.

I keep getting 403 responses from the server. At first I thought It was due to my USER_AGENT settings (set to Mozzila 5....) or headers, but I've tested those (I even went ahead, analyzed X-Instagram-GIS of a request made from inside the browser and checked if the MD5 hash matches on my Scrapy request. The generated URL works fine inside a browser regardless if I'm logged into IG or not - however, it breaks when used inside an Incognito window, Scrapy or Scrapy shell.

At first I thought it meant that scraping is just not possible, however, the rarcega scraper works just fine (except it's not really handy for scraping individual posts rather than whole user profiles).

Any feedback or thoughts would be very much appreciated!

kiwibg
  • 334
  • 2
  • 9

0 Answers0