2

I am trying to write a program in htmlunit to scrape the source code from a website and return it. My code is currently:

public class Htmlunitscraper { 
  private static String s = "website";

  public static HtmlPage scrapeWebsite() throws IOException {
    final WebClient webClient = new WebClient();
    final HtmlPage page = webClient.getPage(s);

    return page.getPage();
  }
}

I thought the method getPage would return the source but I keep running into errors and the url is just returned. These errors are:

Oct 16, 2013 4:07:59 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Oct 16, 2013 4:08:00 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://cpdocket.cp.cuyahogacounty.us/SheriffSearch/Scripts/jquery.js] line=[2] lineSource=[null] lineOffset=[0]
Oct 16, 2013 4:08:00 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Oct 16, 2013 4:08:00 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Oct 16, 2013 4:08:00 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Oct 16, 2013 4:08:01 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Oct 16, 2013 4:08:01 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://cpdocket.cp.cuyahogacounty.us/SheriffSearch/ScriptResource.axd?d=0XCJGMnW_16F7h4EC7avEaQ_Ma7RLZvTA2-XkhkFcfSnWFOkCRjbat77Yi12o3uS3yGC-YMdXQ_w3i5MHWALH-xBqxutgCryrSWcT8prtHkRngrJRiKTP-EYEm1QJ6zB0&t=ffffffff823b7694] line=[2] lineSource=[null] lineOffset=[0]
Oct 16, 2013 4:08:01 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
HtmlPage(http://cpdocket.cp.cuyahogacounty.us/SheriffSearch/results.aspx?q=searchType%3dSaleDate%26searchString%3d10%2f21%2f2013%26foreclosureType%3d%27NONT%27%2c+%27PAR%27%2c+%27COMM%27%2c+%27TXLN%27)@1134201154

Am I not using the right method to return the source, as I cant find a good example of how to do this.

Ctech45
  • 496
  • 9
  • 17

2 Answers2

1

You should see the content of the page by doing:

System.out.println(page.asXml());

That will print it in a nicely formatted way.

All the other stuff you are seeing are javascript errors from the page you're fetching.

If you need the source code from the page without being formatted then check this answer:

Check this answer to turn those warnings off:

Community
  • 1
  • 1
Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
0

Try this code, this will run fine:

public class Htmlunitscraper { 
    private static String s = "website";
    public static HtmlPage scrapeWebsite() throws IOException {
        final WebClient webClient = new WebClient();
        final HtmlPage page = webClient.getPage(s);
        return page.asXml();
    }
}
halfer
  • 19,824
  • 17
  • 99
  • 186