174

I'm searching the java library for parsing XML (complex configuration and data files), I googled a bit but couldn't found other than dom4j (Seems like they are working on V2).. I have taken look at commons configuration but didn't like it, Other apache projects on XML seems under hibernation. I haven't evaluated dom4j by myself but just wanted to know - Does java has other (Good) open source xml parsing libraries? and how's your experience with dom4j?

After the @Voo's answer let me ask another one - Should I use java's built-in classes or any third-party library like dom4j.. What are the advantages?

Ilonpilaaja
  • 1,169
  • 2
  • 15
  • 26
Premraj
  • 7,802
  • 8
  • 45
  • 66

7 Answers7

232

Actually Java supports 4 methods to parse XML out of the box:

DOM Parser/Builder: The whole XML structure is loaded into memory and you can use the well known DOM methods to work with it. DOM also allows you to write to the document with Xslt transformations. Example:

public static void parse() throws ParserConfigurationException, IOException, SAXException {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);
    factory.setIgnoringElementContentWhitespace(true);
    DocumentBuilder builder = factory.newDocumentBuilder();
    File file = new File("test.xml");
    Document doc = builder.parse(file);
    // Do something with the document here.
}

SAX Parser: Solely to read a XML document. The Sax parser runs through the document and calls callback methods of the user. There are methods for start/end of a document, element and so on. They're defined in org.xml.sax.ContentHandler and there's an empty helper class DefaultHandler.

public static void parse() throws ParserConfigurationException, SAXException {
    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.setValidating(true);
    SAXParser saxParser = factory.newSAXParser();
    File file = new File("test.xml");
    saxParser.parse(file, new ElementHandler());    // specify handler
}

StAx Reader/Writer: This works with a datastream oriented interface. The program asks for the next element when it's ready just like a cursor/iterator. You can also create documents with it. Read document:

public static void parse() throws XMLStreamException, IOException {
    try (FileInputStream fis = new FileInputStream("test.xml")) {
        XMLInputFactory xmlInFact = XMLInputFactory.newInstance();
        XMLStreamReader reader = xmlInFact.createXMLStreamReader(fis);
        while(reader.hasNext()) {
            reader.next(); // do something here
        }
    }
}

Write document:

public static void parse() throws XMLStreamException, IOException {
    try (FileOutputStream fos = new FileOutputStream("test.xml")){
        XMLOutputFactory xmlOutFact = XMLOutputFactory.newInstance();
        XMLStreamWriter writer = xmlOutFact.createXMLStreamWriter(fos);
        writer.writeStartDocument();
        writer.writeStartElement("test");
        // write stuff
        writer.writeEndElement();
    }
}

JAXB: The newest implementation to read XML documents: Is part of Java 6 in v2. This allows us to serialize java objects from a document. You read the document with a class that implements a interface to javax.xml.bind.Unmarshaller (you get a class for this from JAXBContext.newInstance). The context has to be initialized with the used classes, but you just have to specify the root classes and don't have to worry about static referenced classes. You use annotations to specify which classes should be elements (@XmlRootElement) and which fields are elements(@XmlElement) or attributes (@XmlAttribute, what a surprise!)

public static void parse() throws JAXBException, IOException {
    try (FileInputStream adrFile = new FileInputStream("test")) {
        JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
        Unmarshaller um = ctx.createUnmarshaller();
        RootElementClass rootElement = (RootElementClass) um.unmarshal(adrFile);
    }
}

Write document:

public static void parse(RootElementClass out) throws IOException, JAXBException {
    try (FileOutputStream adrFile = new FileOutputStream("test.xml")) {
        JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
        Marshaller ma = ctx.createMarshaller();
        ma.marshal(out, adrFile);
    }
}

Examples shamelessly copied from some old lecture slides ;-)

Edit: About "which API should I use?". Well it depends - not all APIs have the same capabilities as you see, but if you have control over the classes you use to map the XML document JAXB is my personal favorite, really elegant and simple solution (though I haven't used it for really large documents, it could get a bit complex). SAX is pretty easy to use too and just stay away from DOM if you don't have a really good reason to use it - old, clunky API in my opinion. I don't think there are any modern 3rd party libraries that feature anything especially useful that's missing from the STL and the standard libraries have the usual advantages of being extremely well tested, documented and stable.

Lakshmikant Deshpande
  • 826
  • 1
  • 12
  • 30
Voo
  • 29,040
  • 11
  • 82
  • 156
  • @Natix that's why the "edit" option is for. Should be better now. – Kikiwa Sep 20 '16 at 21:00
  • 4
    @Kikiwa Exception handling is about as much removed from the point of this post as possible. If some incompetent copy-paste programmer goes ahead and copies snippets without understanding their purpose they get what they deserve. Not really worried or interested about them. What I will say is that removing the try/catch blocks and showing the method signature instead to document what exceptions the different options can throw would save space while still preserving the interesting information. So if someone wants to do that, they should just go ahead. – Voo Sep 20 '16 at 21:10
  • 1
    (At the same time I'll reject edits that remove the try/catch without denoting the additional information in some other way) – Voo Sep 20 '16 at 21:12
  • I believe JAXB is no longer included with the JDK in recent versions. – Slaw Mar 17 '19 at 14:46
12

Java supports two methods for XML parsing out of the box.

SAXParser

You can use this parser if you want to parse large XML files and/or don't want to use a lot of memory.

http://download.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParserFactory.html

Example: http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/

DOMParser

You can use this parser if you need to do XPath queries or need to have the complete DOM available.

http://download.oracle.com/javase/6/docs/api/javax/xml/parsers/DocumentBuilderFactory.html

Example: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/

eis
  • 51,991
  • 13
  • 150
  • 199
RAJH
  • 152
  • 3
9

If you want a DOM-like API - that is, one where the XML parser turns the document into a tree of Element and Attribute nodes - then there are at least four to choose from: DOM itself, JDOM, DOM4J, and XOM. The only possible reason to use DOM is because it's perceived as a standard and is supplied in the JDK: in all other respects, the others are all superior. My own preference, for its combination of simplicity, power, and performance, is XOM.

And of course, there are other styles of processing: low-level parser interfaces (SAX and StAX), data-object binding interfaces (JAXB), and high-level declarative languages (XSLT, XQuery, XPath). Which is best for you depends on your project requirements and your personal taste.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • 2
    DOM is a W3C standard (http://www.w3.org/DOM/). The Java implementation of this standard is covered by the JAXP standard (http://jcp.org/en/jsr/detail?id=206). JAXP is then implemented by different providers such as: Oracle, Apache, etc. – bdoughan Feb 22 '11 at 14:15
  • Indeed, no-one would use DOM at all if it weren't that (a) it was defined as a standard and has multiple implementations, and (b) it's included in the JDK by default. From all other perspectives, JDOM2 and XOM are much preferable. – Michael Kay Mar 02 '20 at 11:06
4

Nikita's point is an excellent one: don't confuse mature with bad. XML hasn't changed much.

JDOM would be another alternative to DOM4J.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Which one will you choose and why? – Premraj Feb 20 '11 at 19:14
  • 1
    It doesn't really much matter. Both are wrappers of the SAX and DOM parsers built into the JDK. The W3C Document hierarchy is verbose and hard to use, so both DOM4J and JDOM try to make it easier. I like Elliott Rusty Harold, so I tend to reach for JDOM first. – duffymo Feb 20 '11 at 19:51
4

You don't need an external library for parsing XML in Java. Java has come with built-in implementations for SAX and DOM for ages.

ChrisJ
  • 5,161
  • 25
  • 20
3

For folks interested in using JDOM, but afraid that hasn't been updated in a while (especially not leveraging Java generics), there is a fork called CoffeeDOM which exactly addresses these aspects and modernizes the JDOM API, read more here:

http://cdmckay.org/blog/2011/05/20/introducing-coffeedom-a-jdom-fork-for-java-5/

and download it from the project page at:

https://github.com/cdmckay/coffeedom

reevesy
  • 3,452
  • 1
  • 26
  • 23
ngeek
  • 7,733
  • 11
  • 36
  • 42
0

VTD-XML is the heavy duty XML parsing lib... it is better than others in virtually every way... here is a 2013 paper that analyzes all XML processing frameworks available in java platform...

http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf

vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30
  • 3
    A warning: VTD-XML it is licensed under the GPL, which effective rules it out in the vast majority of professional or commercial development situations. Engineers should consult their own attorney for an analysis, but if you are paid to do engineering then you will most likely find that your organization does not (and cannot) allow the use any libraries licensed under the GPL. – Sarah G Jun 01 '18 at 04:39
  • That link is dead – null Jul 07 '19 at 07:53