I have a situation where we want to validate an XML document held as a byte stream in memory, against an XSD placed amongst others in a file system. We would like to avoid having the file name explicitly mentioned in the XML file but instead tell the XML parser to use a catalog of one or more XSD files for validation.
My attempt to create a DocumentBuilder provider (for Guice 3.0) looks like:
public class ValidatingDocumentBuilderProvider implements
Provider<DocumentBuilder> {
static final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
static final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource";
Logger log = getLogger(ValidatingDocumentBuilderProvider.class);
DocumentBuilderFactory dbf;
public synchronized DocumentBuilder get() { // dbf not thread-safe
if (dbf == null) {
log.debug("Setting up DocumentBuilderFactory");
// http://download.oracle.com/javaee/1.4/tutorial/doc/JAXPDOM8.html
dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(true);
dbf.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
// parser should look for schema reference in xml file
// Find XSD's in current directory.
FilenameFilter fileNameFilter = new FilenameFilter() {
public boolean accept(File dir, String name) {
return name.toLowerCase().endsWith(".xsd");
}
};
File[] schemaFiles = new File(".").listFiles(fileNameFilter);
dbf.setAttribute(JAXP_SCHEMA_SOURCE, schemaFiles);
log.debug("{} schema files found", schemaFiles.length);
for (File file : schemaFiles) {
log.debug("schema file: {}", file.getAbsolutePath());
}
}
try {
return dbf.newDocumentBuilder();
} catch (ParserConfigurationException e) {
throw new RuntimeException("get DocumentBuilder", e);
}
}
}
(and I have also tried with file names too). Eclipse accepts the XSD - when put in the catalog it can validate the XML dealt with here
It appears to the naked eye that the parser halts briefly when trying to validate. This might be a network lookup.
-Djaxp.debug=1
only adds these lines
JAXP: find factoryId =javax.xml.parsers.DocumentBuilderFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl using ClassLoader: null
How can I get the parser in JDK 6 to tell me what it is doing? If I cannot do that, how do I inspect the XML Catalog usage inside it to see why the XSDs provided are not selected?
What obvious thing have I overlooked?