4

So I'm trying to web crawl clothing websites to build a list of great deals/products to look out for, but I notice that some of the websites that I try to load, don't. How are websites able to block selenium webdriver http requests? Do they look at the header or something. Can you give me a step by step of how selenium webdriver sends requests and how the server receives them/ are able to block them?

PeepingHog
  • 175
  • 1
  • 2
  • 7

2 Answers2

6

Selenium uses a real web browser (typically Firefox or Chrome) to make its requests, so the website probably has no idea that you're using Selenium behind the scenes.

If the website is blocking you, it's probably because of your usage patterns (i.e. you're clogging up their web server by making 1000 requests every minute. That's rude. Don't do that!)

One exception would be if you're using Selenium in "headless" mode with the HtmlUnitDriver. The website can detect that.

John Gordon
  • 29,573
  • 7
  • 33
  • 58
0

It's very likely that the website is blocking you due to your AWS IP. Not only that tells the website that somebody is likely programmatically scraping them, but most websites have a limited number of queries they will accept from any 1 IP address. You most likely need a proxy service to pipe your requests through.

Virtual
  • 21
  • 3