1

I have a html file like the following

...
<span itemprop="A">234</span>
...
<span itemprop="B">690</span>
...

In this i want to extract values as A and B.
Can u suggest any html parser library for java that can do this easily?

vivek_jonam
  • 3,237
  • 8
  • 32
  • 44

3 Answers3

3

Personally, I favour JSoup over JTidy. It has CSS-like selectors, and the documentation is much better, imho. With JSoup, you can easily extract those values with the following lines:

Document doc = Jsoup.connect("your_url").get();
Elements spans = doc.select("span[itemprop]");

for (Element span : spans) {
  System.out.println(span.text()); // will print 234 and 690
}
João Silva
  • 89,303
  • 29
  • 152
  • 158
  • I don't want to extract A and B but the other values 234 and 690 – vivek_jonam Aug 15 '12 at 14:27
  • 1
    @vivek_jonam: Then use `text()` instead, which gives you the content of `span`. I've edited my answer. – João Silva Aug 15 '12 at 14:28
  • ok. works. But can i get the values with A and B alone? there are other itemprop values like A1, C, E, etc. – vivek_jonam Aug 15 '12 at 14:33
  • Yes, there are two ways of doing this. 1) When you are iterating over each span element, you can check if `span.attr("itemprop")` equals `A` or `B`; 2) You can run two selects, one with `span[itemprop=A]` and the other with `span[itemprop=B]`. – João Silva Aug 15 '12 at 14:34
1

http://jsoup.org/

JSoup is the way to go.

srini.venigalla
  • 5,137
  • 1
  • 18
  • 29
1

JTidy is a confusingly named yet respected HTML parser.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440