I need to parse a simple HTML page with a simple form in it. The answers to similar questions on StackOverflow suggest using one of a large variety of non-standard Java libraries such as TagSoup, JSoup, HTMLParser and many others.
However, a web search revealed that there exists some standard functionality in Java SE via this class: http://docs.oracle.com/javase/7/docs/api/javax/swing/text/html/parser/ParserDelegator.html
My sub-questions are:
- Is it really true that the standard ParserDelegator class can parse a use case like mine?
- What are the limitations of the standard library that create the need for so many non-standard libraries?
- Does the fact that ParserDelegator is within swing preclude using it in a regular EC2 cloud server for a web application? Would I have to jump through a lot of hoops to get around the headless aspect or would it be just a small tweak to the configuration?
- If the standard one is not recommended, which non-standard one should I use, given: (a) my desire to not stray far from the standard; (b) my simple use case; (c) desire for a mature reliable implementation; and (d) no size or weight limitations since this is a server application as opposed to an embedded client. API is a far lower priority so while I do appreciate JSoup's CSS selector like API, the other concerns (a) through (d) override it.
Thank you.