0

I need to parse an HTML file in java. Unlike XML there is no repetitive tags. So I need a code that can parse the html file and reach all nodes, it includes nested tags .. etc. The HTML code is not fixed. In other words given any HTML code I need to reach all the tags in the HTML.

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Saicharan S M
  • 828
  • 3
  • 11
  • 25

2 Answers2

1

try this HTML Parser http://htmlparser.sourceforge.net/samples.html

Abhishek Choudhary
  • 8,255
  • 19
  • 69
  • 128
  • Hmmm was not able to understand. Could u pls ellaborate. – Saicharan S M Mar 13 '12 at 06:30
  • This is a kind of HTML Parser you can use in java which will return you all the html contents in xml format like tags will be set to nodes and other text contents and all. CHeck examples – Abhishek Choudhary Mar 13 '12 at 06:40
  • The examples are all command line. I couldn;t find the java example. Sorry for bugging u. Im an amateur. – Saicharan S M Mar 13 '12 at 06:50
  • The examples may all be command line, but they also include links to the Javadocs of the relevant API classes involved. For example, in the entry for `Lexer`, it says, "Print the low level nodes of a web page" which sounds just like what you're looking for. It links to [here](http://htmlparser.sourceforge.net/javadoc/org/htmlparser/lexer/Lexer.html). The source code to the whole thing is also available for study. Now—what have you tried? – Alistair A. Israel Mar 13 '12 at 06:54
  • I have tried jericho, jtidy, jsoup. But i cant figure it out. I cant find any concrete example code anywhere on the net to parse n reach all the tags in an html. – Saicharan S M Mar 13 '12 at 06:59
0

I think you need this...

var els=document.getElementsByTagName("*");
for(var i=0;i<els.length;i+)document.write(els.nodeName+"<br />");
CoffeeRain
  • 4,460
  • 4
  • 31
  • 50
Vinoth Kumar
  • 93
  • 1
  • 1
  • 8