-1

I want to parse an HTML file for these reasons:

  1. To the get the content between tags. For example, between a pair of para tags
  2. To find the occurrence of break tags
  3. To get the attributes of a tag. For example, to get the value of COLOR in FONT COLOR="red">

I need to do this in Java. I know the basics of the Jehrico parser. How can I do it?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Saicharan S M
  • 828
  • 3
  • 11
  • 25

2 Answers2

10

There are no. of Java HTML parsers available like:

You might also want to go through a very comprehensive discussion on pros and cons of using each of these here.

Community
  • 1
  • 1
Umer Hayat
  • 1,993
  • 5
  • 31
  • 58
2

If the HTML you want to parse is XHTML it should be valid XML too. So any XML parser should be able to parse it.

If you can not rely on that, you can search Google for HTML parsers for Java.

Jan Gräfen
  • 314
  • 2
  • 12