3

I was wondering if anyone was able to make HtmlUnit run under Android?

I have a site which I am scraping using Jsoup (this works well). However, one of the sections contains more than 2 pages. The site uses ASP.NET and they are using a Javascript postback for the link that leads to the next page. As a result I need to somehow execute that Javascript to get the next page's content. This is where my attempts at HtmlUnit comes in.

The following code worked perfectly on Java:

WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(true);
HtmlPage page = null;
webClient.setThrowExceptionOnFailingStatusCode(false);
webClient.setThrowExceptionOnScriptError(false);

            try {
                page = webClient.getPage(URLOne.toString());
            } catch (FailingHttpStatusCodeException e1) {
                e1.printStackTrace();
            } catch (MalformedURLException e1) {
                e1.printStackTrace();
            } catch (IOException e1) {
                e1.printStackTrace();
            }

HtmlAnchor anchor = (HtmlAnchor) page.getAnchorByHref("javascript:__doPostBack('lb_next','')");

            try {
                page = (HtmlPage) anchor.click();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

webClient.closeAllWindows();

Document doc1 = Jsoup.parse(page.asXml());

When I setup the necessary libraries in Android I had to remove: xalan, xerces and xml-apis (HtmlUnit on Android). If I keep them I get the conversion to Dalvik error.

Without them the applications runs in Android, but when it comes to the section that requires HtmlUnit I get several of the following errors in logcat:

Could not find method org.apache.http.conn.scheme.Scheme.<init>, referenced from method com.gargoylesoftware.htmlunit.HttpWebConnection.createHttpClient
Could not find method org.w3c.dom.css.CSSStyleDeclaration.getLength, referenced from method com.gargoylesoftware.htmlunit.javascript.host.css.ComputedCSSStyleDeclaration.applyStyleFromSelector
VFY: unable to find class referenced in signature (Lorg/w3c/dom/css/CSSStyleSheet;
VFY: unable to find class referenced in signature (Lorg/w3c/dom/css/CSSStyleDeclaration;

Then the application force closes. This issue is similar to this: How do I get HtmlUnit to work under Android? and HtmlUnit Android problem with WebClient

The only reason I am using HtmlUnit is to be able to run the Javascript on that page. I am open to any alternative that may allow me to do something similar.

Thanks

Community
  • 1
  • 1
Elyas
  • 551
  • 1
  • 5
  • 9
  • I haven't succeeded yet- have you had any success with htmlunit in Android? – bhekman Jun 20 '12 at 01:57
  • Nope, dead end. Given up on trying as well. – Elyas Jun 30 '12 at 07:05
  • If you still need Javascript inside your Android project, check out Rhino. I got it working perfectly & it is MUCH smaller than htmlunit. Bobik is another suggested solution for web scraping. Check my thread here: http://stackoverflow.com/questions/11093130/scraping-dynamically-generated-html-inside-android-app – bhekman Jun 30 '12 at 23:18

2 Answers2

0

DO NOT use htmlUnit.

You would've thought that you would only need a couple of core jars. Nah, you might need all of them otherwise you might run into some class not found errors.

Just take a look at how many jars you have to load into Eclipse before you can run it! A total of 21 jars, over 10mb! Bear in mind that you can also package up to 50mb for Android Market. It just slows Eclipse down and you probably have to increase the memory when you debug.

Use Jsoup instead!

Yini
  • 691
  • 8
  • 9
0

There is a class version clash between HtmlUnit (using HttpClient) and and the version of the HttpClient partly integrated inside the android jdk.

To get around the problem you can use the distribution from the https://github.com/HtmlUnit/htmlunit-android project.

Please try and report any problems.

RBRi
  • 2,704
  • 2
  • 11
  • 14