1

I am trying to navigate to https://my.waveapps.com using headless chrome, and getting a 456 Access denied.

I have no idea why.

Navigating to the same url with the exact same chromium binary in normal mode works fine. Only adding the --headless option breaks it.

I am using Chromium 66.0.3333.0

Vic Seedoubleyew
  • 9,888
  • 6
  • 55
  • 76

1 Answers1

3

There are 2 things to do to avoid Headless Chrome being detected:

  • change user agent
  • if you are using it through chromedriver, for example by using Selenium, then you also need to patch the chromedriver executable

Changing user agent

When used in headless mode, Chrome change its user agent, turning Chrome into HeadlessChrome, as a result it is easily detectable for anyone who wants to deny headless browsers.

So be sure to change user agent to look like a normal browser, for example:

chrome \
    --disable-gpu \
    --headless \
    --remote-debugging-port=9222 \
    --user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' \
    'https://my.waveapps.com/login/'

Patching chromedriver

As explained in this post, the website you are trying to reach uses Distil Networks to detect headless browsers, and to avoid being detected you need to remove any "cdc_" string from the chromedriver executable, by running the following command:

perl -pi -e 's/cdc_/aaa_/g' /path/to/chromedriver

Replace aaa by any three character combination you like.

Seems to work for me:
my.waveapps.com

Vic Seedoubleyew
  • 9,888
  • 6
  • 55
  • 76
Jonathan
  • 772
  • 4
  • 11
  • Thank you VERY much for the reply ! I have tried it, and while I got it to work in some cases, I couldn't get it to work reliably : I had to clear cookies for it to work, and still, some workflows don't work. For example, if I run it, clear cookies, reload, it works. Then if I quit chromium, and relaunch it, then it first appears as Access Denied, and I have to clear cookies again for it to work. Using chromedriver and Selenium, clearing cookies before starting it doesn't get it to work. Any idea ? – Vic Seedoubleyew Jan 29 '18 at 21:35
  • Actually I am getting it to work pretty reliably manually. It seems to go wrong when I go through Chromedriver. I checked user agent by `System.out.println("user agent : " + (String) ((JavascriptExecutor) driver).executeScript("return navigator.userAgent;"));` and it prints the updated value indeed, so the problem seems to be somewhere else. Any idea ? – Vic Seedoubleyew Jan 30 '18 at 19:58
  • I am trying to troubleshoot the problem, but I am having a hard time understanding how to debug. I have set chromedriver to verbose mode, and looked at the logs, but it doesn't tell much more. Apparently this is due to the site using Distil to block bots, but I still can't find why it blocks only when using chromedriver – Vic Seedoubleyew Jan 30 '18 at 20:11
  • Maybe see this answer: https://stackoverflow.com/a/33403473/8512324. It says that Distil Networks' purpose is to block web-scraping... – Jonathan Jan 30 '18 at 20:37
  • 1
    thanks !! I had seen it, but hadn't taken enough time to go through it. I tried patching the chromedriver binary and it worked. Thank you so much for your help ! – Vic Seedoubleyew Jan 30 '18 at 21:26