Can somebody help me get the elements from a HTML page? I don't want to use a library, I just want a pointer or two on how the use the correct regexes and such. I'm kind of stuck on this, all help is appreciated.
Asked
Active
Viewed 72 times
-3
-
2Because he mentioned regex, and somebody was going to do it anyway. OP, [here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) you go. – Perception Nov 17 '12 at 00:16
2 Answers
1
You'll have to decide first whether you're talking about HTML or XML.
If your text is a valid XML document, then you can use JAXP to parse the document and access elements/attributes programatically (no need in regular expressions).
If your text is not a valid XML document, then no set regular expressions is ever going to work for you in 100% of the cases; the best you can do is use the JDK's built-in HTML parser, provided as part of the Swing framework.

Isaac
- 16,458
- 5
- 57
- 81
0
The JDK includes a rudimentary HTML parser. It isn't very robust, but you did specify that you "don't want to use a library". So... knock yourself out, I suppose?

hd1
- 33,938
- 5
- 80
- 91