5

What is the best way to scrape the below HTML from a web page? I want to pull out Apple, Orange and Grape and put them into a dropdown menu in my Android app. Should I use Jsoup for this, and if so, what would be the best way to do it? Should I use Regex instead?

<select name="fruit" id="fruit" >
<option value="APPLE">Apple</option>
<option value="ORANGE">Orange</option>
<option value="GRAPE">Grape</option>
</select>
alexD
  • 2,368
  • 2
  • 32
  • 43

3 Answers3

14

Depends, but I'd go with an XML/HTML parser. Don't use regex.

Example with jsoup:

Document doc = Jsoup.connect(someUrl).get();
Elements options = doc.select("select#fruit option");

More on jsoup selector syntax.


Best way?

I would go with either the built-in DOM parser or SAX parser. If you're going to be parsing a large document, SAX is faster. If the document is small, then there's not much difference. More on SAX vs DOM.

Community
  • 1
  • 1
skyuzo
  • 1,140
  • 7
  • 13
2

For HTML parsing you can use jsoup. The usage is very easy and the API is great.

http://jsoup.org/

For me it worked great!

EDIT: too slow :D skyuzo's post is great :)

dudeldidadum
  • 1,281
  • 17
  • 27
1

WebView is your friend:

http://developer.android.com/reference/android/webkit/WebView.html

It let's you grab html as a browser, and then you can do stuff with it. Take notice that it doensn't take into account javascript, so I hope that's plain html you have therem not some ajax fetched or js generated code :)

Shivan Dragon
  • 15,004
  • 9
  • 62
  • 103
  • Thanks...skyuzo gave me the answer that I had in mind, but this is an interesting option and I'm definitely going to look into it. – alexD Sep 19 '11 at 19:44