31

I'm trying to find a way to validate a large XML file against an XSD. I saw the question ...best way to validate an XML... but the answers all pointed to using the Xerces library for validation. The only problem is, when I use that library to validate a 180 MB file then I get an OutOfMemoryException.

Are there any other tools,libraries, strategies for validating a larger than normal XML file?

EDIT: The SAX solution worked for java validation, but the other two suggestions for the libxml tool were very helpful as well for validation outside of java.

Community
  • 1
  • 1
Dan Cramer
  • 685
  • 1
  • 9
  • 14
  • 1
    For an easy to use Windows tool you can use [XML ValidatorBuddy](http://www.xml-tools.com/ValidatorBuddy.htm) which uses the Xerces SAX parser internally to validate huge files. – Clemens Sep 02 '11 at 06:16

4 Answers4

30

Instead of using a DOMParser, use a SAXParser. This reads from an input stream or reader so you can keep the XML on disk instead of loading it all into memory.

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource(new FileReader ("document.xml")));
jodonnell
  • 49,859
  • 10
  • 62
  • 67
8

Use libxml, which performs validation and has a streaming mode.

John Millikin
  • 197,344
  • 39
  • 212
  • 226
  • @oob Yes, libxml2 works perfectly. Also, if anyone is looking for the Windows binaries, they are here: ftp://ftp.zlatkovic.com/libxml/ – sfarbota May 05 '14 at 17:09
3

Personally I like to use XMLStarlet which has a command line interface, and works on streams. It is a set of tools built on Libxml2.

dlamblin
  • 43,965
  • 20
  • 101
  • 140
1

SAX and libXML will help, as already mentioned. You could also try increasing the maximum heap size for the JVM using the -Xmx option. E.g. to set the maximum heap size to 512MB: java -Xmx512m com.foo.MyClass

GaZ
  • 2,346
  • 23
  • 46