I am running a Python script that scrapes a website. It uses Imperva to detect automated scripts crawling through it's web pages. Imperva has blocked my IP from accessing the site as soon as I run the script. I did read someone suggest including a time.sleep(random.randint(a,b))
(to try and mimic human behaviour) in the script which it didn't work or perhaps it just wouldn't work as a standalone method. If it's the chrome driver itself that they detect then I guess it would be impossible to avoid. Does anyone have any practical suggestions on things that I could include in my script to bypass this?. Thanks in advance.
1 Answers
Introduction
There are many different components that need to be added to a web scraper to make it undetectable. I recommend using the below code to test your current level of detection:
driver.get("https://bot.sannysoft.com/")
More than likely, you will fail most of those tests right off the bat, fortunately, it's easy to configure a scraper that will pass all of those tests and be completely undetectable.
Selenium-Stealth
selenium-stealth is a python package that is used to avoid detection. Simply...
pip install selenium-stealth
and follow the below configuration:
stealth(driver,
user_agent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36',
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
Your web scraper should pass all of the tests, now try to implement this solution on the Imperva site.
More information
If you are still getting blocked, I recommend looking into the random-user-agent library to cycle your user agent within the "user_agent" variable of the selenium-stealth configuration. Otherwise, you could pay for a proxy provider to completely disguise your IP. Although keep in mind, proxy networks currently do not have a selenium configuration.
Information on Proxy Network Selenium Configuration: Python Selenium Proxy Network
Information on Selenium Detectability in the Cloud: Python Selenium AWS Lambda Change WebGL Vendor/Renderer For Undetectable Headless Scraper

- 637
- 5
- 19
-
I've never heard of selenium-stealth before. Kudos for bringing that up!. – May 23 '22 at 21:52
-
Did selenium stealth work for you? Me not. – Jack Feb 27 '23 at 00:29
-
1As an alternative, you can try undetected-chromedriver. – Toolmaker Mar 02 '23 at 15:59
-
selenium-stealth doesn't do anything which leads me to believe that might be some extra bit of work if you already blacklisted... – Alex Zubkov May 30 '23 at 22:01
-
@AlexZubkov I would recommend proxy cycling then. Sounds like they blacklisted your IP – Luke Hamilton Jun 01 '23 at 16:34
-
@LukeHamilton yep, selenium-wire library with socks5 proxy does the trick! – Alex Zubkov Jun 01 '23 at 21:32