After searching several hours on stack overflow and other pages I wasn't able to find any solution to my problem yet. I would like to scrape thru the page https://www.bstn.com/eu_de, via Python Selenium and ChromeDriver.
When visiting the page with a normal browser like Firefox or Chrome it opens without any issues. However, when using Selenium it gets a white screen page back. My script already includes the standard procedures found on StackOverflow hundreds of times:
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
I'm also using rotating and updating user agents on every request.
Further investigation has shown the server seems to throw a 429 error. Normally 429 states that there are too many requests that had been made, but since I've only tried it less than 10 times and on the normal browsers it still works this doesn't seem to be the problem.
Another look at Chromes Network -> Headers tab shows that the server throwing the 429 error is Cloudflare so it seems that Cloudflare is involved in any way. I've compared the Request Headers of a successful connection (Right on picture) and a 429 Error connection on the left. Headers comparison
The only thing that is different is a slightly larger cookie set (all cookies were deleted before the request where made), a referer header, the sec-fetch-site value containing same-origin, and sec-fetch-user: ?1 . Adding/changing this header information with a tool called selenium wire, doesn't seem to affect the problem I'm facing in any kind of way.
I also could identify a request cookie: "name":"KP_REF","domain":"www.bstn.com","value":""
being created on the normal browser and doesn't exists when using Selenium. Adding:
driver.add_cookie({"name":"KP_REF","domain":"www.bstn.com","value":""})
also doesn't change anything.
What am I missing or doing wrong to be able to access this page? I'm also not using Chrome headless so far and I depend on using ChromeDriver, as this is the standard inside of our application. I also insist on ChromeDriver as ChromeDriverManager doesn't seem to work with undetected-ChromeDriver.