XML Parser provides way how to access or modify data present in an XML document. Java provides multiple options to parse XML document. Following are various types of parsers which are commonly used to parse XML documents.
Dom Parser - Parses the document by loading the complete contents of the document and creating its complete hiearchical tree in memory.
SAX Parser - Parses the document on event based triggers. Does not load the complete document into the memory.
JDOM Parser - Parses the document in similar fashion to DOM parser but in more easier way.
StAX Parser - Parses the document in similar fashion to SAX parser but in more efficient way.
XPath Parser - Parses the XML based on expression and is used extensively in conjuction with XSLT.
DOM4J Parser - A java library to parse XML, XPath and XSLT using Java Collections Framework , provides support for DOM, SAX and JAXP.
Please read the aforementioned.Also if it is html directly, consider jsoup
.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.