-3

Can somebody help me get the elements from a HTML page? I don't want to use a library, I just want a pointer or two on how the use the correct regexes and such. I'm kind of stuck on this, all help is appreciated.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
user1681891
  • 281
  • 1
  • 4
  • 12
  • 2
    Because he mentioned regex, and somebody was going to do it anyway. OP, [here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) you go. – Perception Nov 17 '12 at 00:16

2 Answers2

1

You'll have to decide first whether you're talking about HTML or XML.

If your text is a valid XML document, then you can use JAXP to parse the document and access elements/attributes programatically (no need in regular expressions).

If your text is not a valid XML document, then no set regular expressions is ever going to work for you in 100% of the cases; the best you can do is use the JDK's built-in HTML parser, provided as part of the Swing framework.

Isaac
  • 16,458
  • 5
  • 57
  • 81
0

The JDK includes a rudimentary HTML parser. It isn't very robust, but you did specify that you "don't want to use a library". So... knock yourself out, I suppose?

hd1
  • 33,938
  • 5
  • 80
  • 91