0

I have been reading a bunch of selenium / webscraping python posts about bypassing webpage security so I can scrape this website: https://www.coches.net/.

I've seen there are many ways to bypass some webpages security to "simulate" you are a human, for example using the correct agents and so, and after everything that I found I copied this code from another guy:

from selenium import webdriver
from time import sleep

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'./chromedriver.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.coches.net/')
sleep(5)

But it doesn't work either (by doesn't work I mean that the webpage blocks me because it thinks I'm a bot, I didn't had this experience with easier webpages so I don't know what to do). I'm at a dead end and I would appreciate any help or advise you guys could give.

I would also like to point that I'm self taught and I'm very new to this, I've used python before but I am by no means an expert, so any resources for learning about this would be appreciated too. Thanks!

DheltaHalo
  • 109
  • 1
  • 6
  • Could you edit your question and expand on "it doesn't work" to explain what happens; what specifically doesn't work? – Lucan Nov 09 '21 at 16:00
  • Thanks for the tip! Already changed it. Basically what I mean by doesn't work is that the webpage blocks me because it thinks I'm a bot. – DheltaHalo Nov 09 '21 at 16:24
  • I would suggest inspecting the requests from both your real browser and via Selenium to see what might be tripping their bot detection system. It seems to place your IP on a temporary blacklist which inhibits testing. – Lucan Nov 09 '21 at 16:57
  • How can I see exactly where it fails? I know how to see requests but how can I know at which request does the webpage reject me? – DheltaHalo Nov 09 '21 at 19:42
  • You can't unless you can get your hands on the code that identifies automated requests. It's a guessing game until you find the trigger, unfortunately. – Lucan Nov 10 '21 at 00:34

0 Answers0