1

I'm crawling website secured with Cloudflare and sometimes getting an error due to redirection to page with ReCapcha, the page cannot be even loaded due to some javascript error. The code is failing on #getPage method and i have no idea why.

Here is the code works fine for normal pages, but fails on confirmation page:

final WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setJavaScriptEnabled(true);

    final HtmlPage page = webClient.getPage("https://mydummy.site");

    webClient.waitForBackgroundJavaScript(10000);

    int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
    int loopCount = 0;
    while (waitForBackgroundJavaScript > 0 && loopCount < 2) {
        ++loopCount;
        waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
        if (waitForBackgroundJavaScript == 0) {
            break;
        }
    }

Logs:

java.lang.RuntimeException: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:305)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:539)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:399)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:316)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:467)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:449)
at Main.htmlUnit(Main.java:156)
at Main.main(Main.java:43)
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:892)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:616)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:532)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:772)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:748)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:104)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:992)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:371)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:246)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:298)
Chris Dust
  • 11
  • 3

1 Answers1

0

We have been struggling with this issue as well. Our test suite ran perfectly until late 2018 when this issue broke all of our logins. I believe Google has put this in deliberately to break automated attempts to break captchas, because solving one part of this seems to only lead to another problem. Both loading the page and submitting the page causes issues, even if you tell HtmlUnitDriver to ignore all JavaScript errors.

I have tried several options at this point. If you use the Google specified test site key, then the errors go away. So if you have full server-side control of how that site key is generated, you are OK. Remember to ensure that the test site key shows up again on validation errors and all similar use cases, otherwise you will get that error.

(Unfortunately for us, our login page is plain JSP and so implementing this is a headache unless we want to change the URL everywhere. Still debating what to do, for right now we do have a workable if ugly solution that involves some conditional logic on the page and catching JavaScript exceptions at other points in the test code.)

Erica Kane
  • 3,137
  • 26
  • 36