1

Possible Duplicate:
Is there a validating HTML parser implemented in Java?

Hi,

Is there is any API which parse the HTML text using java.

All the function should in the format of Objects

e.g. In the following text i want to Parse the HTML file and parser should return me the list of tags , attribute ..

<HTML>
<BODY>
    <INPUT TYPE="text" value="100">
</BODY>
</HTML>

Thanks

Community
  • 1
  • 1
Vicky
  • 9,515
  • 16
  • 71
  • 88
  • 1
    Please search before you ask. There are **tons** of questions just like this. – Joachim Sauer Feb 10 '10 at 12:11
  • and tons of google results for "parse HTML java" – Bozho Feb 10 '10 at 12:16
  • @Bozho: that alone is not a reason not to post on here. – Joachim Sauer Feb 10 '10 at 12:22
  • 1
    it is for posting a question like "is there an API" - there is. It isn't a reason for not asking "which is a _good_ parsing API" – Bozho Feb 10 '10 at 12:33
  • @Bozho: When someone asks "is there an API" they **always** mean "which API should I use". Assuming anything else is just willfully ignoring the real question. It's not a good way to state that question, but it's also not useful to anyone to claim not to realize that something else was meant. – Joachim Sauer Feb 10 '10 at 12:34

3 Answers3

6

Comprehensive list here

lucrussell
  • 5,032
  • 2
  • 33
  • 39
2

Refer to HTML/XML Parser for Java and Is there a validating HTML parser implemented in Java? and finally Which HTML Parser is the best?

These should answer your question nicely.

Community
  • 1
  • 1
Elister
  • 1,576
  • 2
  • 10
  • 14
0

Regex's should work just fine.... cough

Richard Walton
  • 4,789
  • 3
  • 38
  • 49
  • 1
    +1 for humor! Also see the top response at http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Don Roby Feb 10 '10 at 12:38
  • Haha, yeah - I remember reading that on the Coding Horror blog. Good stuff indeed :) – Richard Walton Feb 10 '10 at 14:55