I'm using a Webdriver-based crawler to collect informtion from a chinese news website (Toutiao). Since 2020 Fed. 16, I found the site does not reponse any data to the chrome controlled by webdriver program, but chrome started manually work fine (as the below figure showed).
The left side is chrome started manually, the right side is chrome controlled by webdriver.
Two chromes is working on the same IP, and I have defined the same User-agent for the two chrome. Moreover, I use following codes (from DebanjanB) to remove "navigator.webdriver" (as shown in above figure, the code is successful):
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'./chromedriver')
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
Summery, the same IP, User-agent, and "navigator.webdriver" is removed. Why does the website still detect my chrome is controlled by webdriver?
UPDATING
The website acquires content through access a url. If I copy and access the url (with encrypted parameters) from manully starting chrome to the webdriver controlled chrome, the server will send corrent informtiaon to webdriver.
So, the website definitly detect webdriver in generating the url and its encrypted parameters.
UPDATING 2
The disuss "Can a website detect when you are using selenium with chromedriver?" does not solve the problem, Please note !