So I'm trying to web crawl clothing websites to build a list of great deals/products to look out for, but I notice that some of the websites that I try to load, don't. How are websites able to block selenium webdriver http requests? Do they look at the header or something. Can you give me a step by step of how selenium webdriver sends requests and how the server receives them/ are able to block them?
2 Answers
Selenium uses a real web browser (typically Firefox or Chrome) to make its requests, so the website probably has no idea that you're using Selenium behind the scenes.
If the website is blocking you, it's probably because of your usage patterns (i.e. you're clogging up their web server by making 1000 requests every minute. That's rude. Don't do that!)
One exception would be if you're using Selenium in "headless" mode with the HtmlUnitDriver. The website can detect that.

- 29,573
- 7
- 33
- 58
-
is it possible that the website is also blocking my aws ip address? Are websites allowed to put a block on all amazon ips knowing that they're not customers? – PeepingHog Nov 22 '16 at 20:47
-
I think websites are _allowed_ to do pretty much anything they want, yes. – John Gordon Nov 22 '16 at 21:09
-
Did the website block you from the very first day, or only after you'd accessed it for a while? – John Gordon Nov 22 '16 at 21:26
-
since the very first time I used selenium on it, I never got through – PeepingHog Nov 22 '16 at 21:44
-
Then they might be blocking the entire aws domain. – John Gordon Nov 22 '16 at 21:47
It's very likely that the website is blocking you due to your AWS IP. Not only that tells the website that somebody is likely programmatically scraping them, but most websites have a limited number of queries they will accept from any 1 IP address. You most likely need a proxy service to pipe your requests through.

- 21
- 3