1

So there is this chat website where you go, click "connect as guest", enter your username, complete a ReCaptcha v2, click "CONNECT" and you're in.

If I do this in my browser, it works normally. If I do this in the chromedriver browser controlled by selenium, I get an error. (The error is "please enter the captcha", but that's irrelevant. The server is clearly detecting me because there is a special response coming from the server, triggering this error)

Important: in both cases (my browser and chromedriver browser), I'm doing everything 100% manual! I just use selenium to launch the browser and then proceed from there. I even tried adding an option to the chromedriver browser to use my actual browser settings. It loads with my history, cookies, everything. But when I try to enter the chat room, I get the error.

I also looked online and found some people claiming websites could detect selenium by noticing some specific javascript variable "cdc_". I've edited the hex code of the chromedriver, changed the variable as instructed online, tried again, same result. I spent hours trying to figure this out...

There is one interesting thing that could help find the problem: If I have my browser opened and I try to run the python script with the chromedriver using my profile, the chromedriver browser will start and the python code will return an error saying the profile is already in use (but the browser will remain opened). Now if I try to access the chat room with this chromedriver browser, it works.

EDIT: I've looked at the requests through Fiddler for both cases and the headers are 100% identical! And I mean 100%! Even the sessionid, PHPSESSID and cfuid are the same since it uses the same profile.

The only thing changing is the post request data. More specifically the captcha response (because its a different one) and another variable s. This variable s is somehow calculated using a weird javascript file called challenge. I'm not sure what that could do or how it works.

EDIT: SOLVED I fixed this by adding an option:

options.add_argument("--disable-blink-features=AutomationControlled");

Flabian
  • 77
  • 1
  • 8
  • 1
    This is the topic of an ongoing arms race. Even if we knew which tactic was used in your specific instance, if it were publicly disclosed the folks behind recaptcha would find another one and repeat -- so any knowledge base entry on the topic wouldn't stay useful for long. – Charles Duffy Sep 13 '20 at 18:48
  • @CharlesDuffy This is not recaptcha related. As I said, I completed the captcha manually. As a test I even tried to complete the captcha in my main browser and use the result in the selenium browser (its possible to pass captcha response from one browser to another. You just have to paste it in a hidden textarea). Same thing. – Flabian Sep 13 '20 at 18:50
  • 1
    Ways to try to detect headless browsers include enumerating fonts, inspecting screen resolution and DPI, looking at available extensions and their behavior (including things like WebGL), etc. – Charles Duffy Sep 13 '20 at 18:50
  • 1
    The point stands: ongoing arms race. We prefer questions whose answers are likely to remain accurate over time. – Charles Duffy Sep 13 '20 at 18:51
  • @CharlesDuffy I am not use a headless browser. Please read the question carefully. Everything is done manually. – Flabian Sep 13 '20 at 18:51
  • 2
    There are lots of companies offering services to protect websites from bots (Selenium, puppeteer, requests, ...) - they won't tell you how to bypass their services or what is going on behind the scenes. – Maurice Meyer Sep 13 '20 at 18:55
  • 4
    Whether you're using a headless browser is not particularly pertinent. The point, repeating myself once more, is that you're asking us to get involved in an arms race. As Maurice says, there are folks actively researching detection methods, and any detection mechanism for which an evasion method becomes public is likely to change to defeat that evasion. – Charles Duffy Sep 13 '20 at 18:58
  • @CharlesDuffy I understand websites are constantly trying to detect if you're a human or not. But at the same time, I know that we have access to every single bit of information our browser processes and sends back. So if this detection would happen locally, people would be able to detect it. Now lets say it happens on their server. My question remains: what gives? I can see exactly all the requests I'm making. I can see they are 100% identical. This cant be black magic, can it? – Flabian Sep 13 '20 at 19:18
  • @Flabian: It is black magic and I assume the detection is not done based on requests (that might work for proxies/vpns) only, it's done in JavaScript (there might be special variables/attributes like `navigator.webdriver`... – Maurice Meyer Sep 13 '20 at 19:30

0 Answers0