3

I want to scrape my website and then use the data from the website to populate elements in my app, my website has login pages and certain pages only open after the login has been done.

I started working with HtmlUnit as it is a headless browser and completed the custom api in a java IDE, later i tried to use the jar i generated from the java IDE and found that there are incompatibility issues with HtmlUnit and Android.

Can anyone propose a solution to this problem?

Edit : Since no one actually answered this question I am currently going with a work around using android's native WebView, settings its Visibility to invisible and then using javascript interfacing to a Java object, I can inject JS code to scrape any data.

Sujal Mandal
  • 975
  • 1
  • 14
  • 29
  • If you're scraping HTML from your own website to use the data in your own app, you're doing it incomprehensibly wrong. – Jonathon Reinhart Dec 26 '15 at 08:38
  • 1
    Can I ask why you need to scrape your own website? It would be much better to have your app connect to your server, via a special API if necessary, and pull information from there. – EkcenierK Dec 26 '15 at 08:40
  • I just want to do it that way, I like the idea of my app and website being two different entities and not accessing the internals of my website, so is there any solution ? – Sujal Mandal Dec 26 '15 at 08:44
  • see my answer. I hope it will help you – Zeeshan Shabbir Dec 26 '15 at 09:17
  • I think it is a great idea because you can use google sites to create a free web page and have your app scrape that page for what to display, then no server needed, or server cost. This process also allows you to change content across all installed apps instantly! No upgrading to newer version of app to get new data. – pstorli Jul 17 '21 at 21:02

2 Answers2

6

Use Jsoup library for such purpose. Very handy and easy to use. Start with this answer and follow documents and other examples.

jFrenetic
  • 5,384
  • 5
  • 42
  • 67
Zeeshan Shabbir
  • 6,704
  • 4
  • 38
  • 74
  • 1
    Thanks for trying to help Zeeshan, i am trying to use jsoup for my purpose, but the problem is in a login page there can be many hidden variables and there can be javascript methods instead of direct submits , the code won't be straight forward as something like getPage().getForms[0].click(); – Sujal Mandal Dec 26 '15 at 09:33
4

If a real headless browser able to manage any recent web features, would exist, it would mean a team would have developed it and then invest much effort in it (in supporting existing and coming features) consistently.

Apart from Opera, Chrome, IE, and Firefox browsers, there is no such team. I would point out Chromium (CEF) as the most open and actively supported cross language wise. Try Cef for java

Community
  • 1
  • 1
Fab
  • 14,327
  • 5
  • 49
  • 68
  • 1
    I have read all those posts before but at the end all of them say its(htmlunit) is incompatible with android, basically i want a headless browser which can be programmed to surf like a real human would. – Sujal Mandal Dec 26 '15 at 08:48
  • Any requests to find a recommended tool/technology etc... is off-topic on StackOverflow – Fab Dec 26 '15 at 09:30
  • hey Fab! i am very much sure I didn't ask someone to name a technology or api, i want a solution to webscraping from a android app which has user friendly methods like they have in HtmlUnit since htmlunit is not compatible with android, can you think of some solution? – Sujal Mandal Dec 26 '15 at 09:35