So I'm web scraping Google and am pretty sure it's blocking my requests based on the IP address. I've deployed my app to Heroku (which has dynamic IP addresses when the dynos restart) and I've noticed that if the app is up, after 5 requests, they stop being able to scrape properly. If I restart the dyne, then I get another 5 requests before it stops being able to scrape. This leads me to believe the static IP address when the dyno is up is the issue. I looked into QuotaGuard Dynamic IP's (https://devcenter.heroku.com/articles/quotaguard) but I don't think that will work because Google seems to be https. Has anyone dynamically proxied their requests through different IPs on Heroku before (if so, what do you recommend using)? I'm working in a Node.js environment.
Asked
Active
Viewed 918 times
1
-
This is against Google's terms of service. Please respect the terms of service instead of trying to bypass whatever technical restrictions Google might have in place. – ChrisGPT was on strike May 17 '21 at 00:52
-
@Chris Lots of companies scrape google search results. I don't think it's that bad. If you have any suggestions as to how to route a request to a different IP though, it would be a big help. – nickcoding2 May 17 '21 at 01:09
1 Answers
0
Use proxies, there are tons of paid / free proxies. You can rotate the proxies on each requests. Do note that there are different types of proxies. DataCenter IPs, Residential IPs and the most expensive Mobile IPs.

Goh Kok Han
- 125
- 1
- 9
-
So right now I'm simulating a Linux x86_64...if my actual client is a mobile device (and not a linux), then would I still have to use 'expensive Mobile IPs'? Do you have any recommendations of well-documented proxy integrations? Just as an FYI I'm using express for my POST and GET requests. – nickcoding2 May 17 '21 at 11:10
-
Using express though, how do you falsify the IP that you're sending it from? For example, I was looking at this package: https://www.npmjs.com/package/free-proxy but it doesn't seem to have any documentation about how to actually use the proxy once you have it. – nickcoding2 May 17 '21 at 11:33