0

I try to get page via Java HtmlUnit but it isn't a fast process, after investigation, I have already understood that it's happened because we should wait for load and apply js but for me, it is not necessary because I have HTML after receiving a response from the server.

What do I want to have?

I want to have an opportunity to start to parse immediately after getting a response (without load js and external resources like CSS iframes etc., just plain HTML string ). Is it possible?

My code example:

try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.getOptions().setThrowExceptionOnScriptError(false);

        //get page (use this site just for example)
        final HtmlPage page = webClient.getPage("https://godaddy.com/");

        //after line of code above, I have long response

        final DomNode articleNode = page.querySelector("body");
        final String articleText = articleNode.getTextContent();

    } catch (Exception e){
        e.printStackTrace();
    }

And also my htmlUnit output log here:

2018-12-08 16:01:56.049  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:72403] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.061  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:88489] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.063  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:89596] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.064  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:89920] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.064  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:90001] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.064  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:90070] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.072  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:98871] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.072  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:98902] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.073  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:99516] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.073  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:99547] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.147  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:188511] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.147  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:188567] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.180  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/a864c140db4a4a32bd44db55efbefe54/uxcore2.min.css' [1:221027] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.222  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:1512] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.225  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:3780] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.240  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:20465] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.241  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/    wrhs-assets/1c6b4a67ff58cc9b92b0c6d2c6e48e4b/salesheader.min.css' [1:20654] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>,     "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <    ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <    URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.441  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/cms/sales/    css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:115341] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>,     "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <    ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <    UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.441  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/cms/sales/    css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:116052] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>,     "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <    ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <    UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.445  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/cms/sales/    css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:118964] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>,     "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <    ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <    UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.445  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/cms/sales/    css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:119073] Error in expression. (Invalid token "(". Was expecting one of: <S>, <NUMBER>,     "inherit", <IDENT>, <STRING>, "-", <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <    ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <    UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
2018-12-08 16:01:56.560  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://ua.godaddy.com/assets/cms/sales/    css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:226900] Error in media list. (Invalid token "screen". Was expecting one of: <S>, "(".)
2018-12-08 16:01:56.560  WARN 19846 --- [   scheduling-1] c.g.htmlunit.DefaultCssErrorHandler      : CSS warning: 'https://ua.godaddy.com/assets/cms/    sales/css/sales-cms-db6c71f1156e11f73c0ffc77d891e668.min.css' [1:226900] Ignoring the whole rule.
2018-12-08 16:01:56.771  WARN 19846 --- [   scheduling-1] c.g.htmlunit.html.HtmlScript             : Script is not JavaScript (type: application/    ld+json, language: ). Skipping execution.
2018-12-08 16:02:01.040  WARN 19846 --- [   scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/    javascript'.
2018-12-08 16:02:01.410  WARN 19846 --- [   scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/    javascript'.
2018-12-08 16:02:01.857  WARN 19846 --- [   scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'application/    x-javascript'.
2018-12-08 16:02:02.787  INFO 19846 --- [   scheduling-1] c.g.h.javascript.JavaScriptEngine        : Caught script exception

com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "sameSizeGroup" of undefined (https://img1.wsimg.com/cms/sales/js/    sales-cms-4cebf668dc11f19307efd483ab9e770a.min.js#1)

2018-12-08 16:02:03.794  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.796  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.872  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.874  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.928  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:03.929  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.292  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.294  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.341  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.367  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.388  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.410  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.718  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.720  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:05.729  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:09.029  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'PREFIX_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:09.029  WARN 19846 --- [   scheduling-1] c.g.h.javascript.host.css.CSSStyleSheet  : Unhandled CSS condition type     'SUBSTRING_ATTRIBUTE_CONDITION'. Accepting it silently.
2018-12-08 16:02:09.562  WARN 19846 --- [   scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/    javascript'.
2018-12-08 16:02:11.017  WARN 19846 --- [   scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/    javascript'.
2018-12-08 16:02:11.599  WARN 19846 --- [   scheduling-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/    javascript'.    

PS. Dirty solution: I have already found this solution https://stackoverflow.com/a/14227559/4207348 but I think it's dirty... My code example with this solution (speeded up my parsing process 20-times):

// use this site just for example
final String URL = "https://godaddy.com/";
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {

    webClient.setWebConnection(new WebConnectionWrapper(webClient) {
        @Override
        public WebResponse getResponse(final WebRequest request) throws IOException {
            if (request.getUrl().toString().contains(URL)) {
                return super.getResponse(request);
            } else {
                return new StringWebResponse("", request.getUrl());
            }
        }
    });

    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);

    //get page 
    final HtmlPage page = webClient.getPage(URL);

    //after line of code above, I have long response

    final DomNode articleNode = page.querySelector("body");
    final String articleText = articleNode.getTextContent();

} catch (Exception e){
    e.printStackTrace();
}
Volodymyr Bilovus
  • 654
  • 2
  • 7
  • 20

1 Answers1

0

You can disable javascript and css like this:

WebClient webClient = new WebClient();
webClient.setCssEnabled(false);
webClient.setJavaScriptEnabled(false);

Then you can get pure html like this:

WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://www.yourpage.com");
String originalHtml = page.getWebResponse().getContentAsString();
Mohsen
  • 4,536
  • 2
  • 27
  • 49
  • Thank's for the answer but it isn`t what I needed. I don't have the ability to turn off JS and also htmlUnit analyze all link and images which slow down getting speed and I have found just one solution to prevent these behaviors (I have described its in PS section) but it's dirty one... – Volodymyr Bilovus Dec 08 '18 at 21:47