0

I want to read a html file line and line and need to store the elements .for textbox i have to store the id,name,type attribute values into some collection. In the same i need to get attributes for checkbox, radiobox etc

Is their any API to parse the html file line by line.

  • Before attempting to parse HTML with anything, take a look at the top answer here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – NickJ Mar 06 '14 at 12:30
  • 1
    Check this out http://stackoverflow.com/questions/2168610/which-html-parser-is-the-best – Laura Mar 06 '14 at 12:34

4 Answers4

2

You can use a DOM Parser and read all Elements and Attributes. Or you could use this library(jsoup) which is based on the DOM Parser.

Klemens Morbe
  • 595
  • 9
  • 24
  • 1
    I would have recommended jsoup, simple and easy to use with very good documentation. +1 – tpbapp Mar 06 '14 at 12:38
  • reading the input elements using jsoup like "doc.getElementsByTag("input")". By using this i am able to read the attribute values. but the problem is, i should not hardcode the word "input" or "form" or "textarea". –  Mar 06 '14 at 12:46
1

Use Class StringBuilder

 StringBuilder contentBuilder = new StringBuilder();
 try {
      BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
      String str;
      while ((str = in.readLine()) != null) {
          contentBuilder.append(str);
      }
      in.close();
 } catch (IOException e) {
      System.err.println("HTML File Read Error: " + e.getMessage());
 }
 String content = contentBuilder.toString();
Jaykumar Patel
  • 26,836
  • 12
  • 74
  • 76
0

No, since that doesn't make sense: HTML has no useful notion of "line". What you need to do is read the HTML element by element.

There are lots of parsers for XML but HTML is a more lenient, so you need a special parser for it. Try JTidy.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
0

NekoHTML is one of the many html parsers that you could use.

Hirak
  • 3,601
  • 1
  • 22
  • 33