I recently needed to implement schema validation in java with a schema that imports another schema (from henceforth I'll refer to this as a schema hierarchy). Much to my surprise I found out a schema hierarchy greatly complicates the code utilized for simple standalone schema validation. My understanding of the fix (and what I created) was an impl of the LSResourceResolver
interface and an impl of the LSInput
interface to return. My understanding is this is necessary when needing to validate against a hierarchy of schemas.
I find this frustrating because once the validator has a handle to the root schema, any imports are simply relative to that location. Wanting to make validation easier and reusable, I proceeded to create a resolver that would ultimately simplify schema validation to two inputs for every situation.
- What's the root schema
- What's the payload you want to validate.
In other words my goal is to make something like the following work for any schema structure:
XmlValidator validator = new XmlValidator("some/dir/root.xsd");
validator.validate("<xml><someXml/></xml>");
When looking at the documentation for the function that is called to load resources, you find out that the first issue is the resolver isn't called to load the root resource (root schema). You need that root schema's path to be able to look up the other relative paths from it. This can be overcome by passing the root path into the constructor for the resolver and tracking it manually.
Then comes the roadblock. The systemId
parameter reliably contains the resource trying to be resolved/loaded (this string is exactly what the import/include/redefine schemaLocation attribute is). For example:
If the current schema you are loading has this line:
<xsd:include schemaLocation="../given/redefine.xsd"/>
The systemId when loading redefine.xsd will be:
"../given/redefine.xsd"
However, the baseURI
parameter which is supposed to hold the resource that was previously being loaded (which you must know because you're creating a relative path based off of the previous resource's location) can be null
, and in my experience is for 2/3 of the schemas that will be loaded.
This is the point where I feel the java internal validation cannot provide the solution I'm looking for. The problem we are trying to solve seems very simple. Given a root schema, load all other included schema's based off of the root schema's location. Unless I'm missing something, this is now impossible because baseURI
can be null
and thus the previous schema cannot be tracked.
Surely we can't be this far along in java's lifetime and this problem isn't resolved. What am I missing here? Is it correct that it is now impossible to write a validation utility and only feeding the two above inputs? What are others using for schema validation? I have to believe others don't constantly keep rolling custom resolver classes to dance around a schema hierarchy (which should be fairly common).
Here is a simple representation of the problem trying to be solved. I am looking for the simplest, most java-like way to solve this sample problem:
Assume sample project structure of:
src/main/java/sandbox/TestValidation.java
src/main/resources/sandbox/sample.xml
src/main/resources/sandbox/custom/wrapper.xsd
src/main/resources/sandbox/custom/candy.xsd
src/main/resources/sandbox/given/base.xsd
src/main/resources/sandbox/given/redefine.xsd
TestValidation.java:
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;
import com.blarg.validation.XmlValidator;
public class SchemaValidationTest {
public SchemaValidationTest() throws Exception {
// The linked suggested solution which fails because
// it cannot load the first referenced schema
Source schemaFile = new StreamSource(
getClass().getClassLoader()
.getResourceAsStream("sandbox/custom/wrapper.xsd"));
Source xmlFile = new StreamSource(
getClass().getClassLoader()
.getResourceAsStream("sandbox/sample.xml"));
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
try {
validator.validate(xmlFile);
System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
System.out.println(xmlFile.getSystemId() + " is NOT valid");
System.out.println("Reason: " + e.getLocalizedMessage());
}
// My custom validator which succeeds all the way until
// it reaches the candy.xsd for reasons described above and again below.
XmlValidator customValidator = new XmlValidator("sandbox/custom/wrapper.xsd");
customValidator.validate(getClass().getClassLoader().getResourceAsStream("sandbox/sample.xml"));
}
public static void main(String[] args) throws Exception {
new SchemaValidationTest();
}
}
sample.xml:
<?xml version="1.0" encoding="UTF-8"?>
<Wrapper> <!-- wrapper.xsd -->
<GiftBasket>
<Fruit> <!-- base.xsd -->
<Apple>
<Size>medium</Size>
<Color>Red</Color> <!-- redefine.xsd -->
</Apple>
<Orange>
<Size>large</Size>
</Orange>
</Fruit>
<Candy> <!-- candy.xsd -->
<Caramel>salted</Caramel>
</Candy>
</GiftBasket>
</Wrapper>
wrapper.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xsd:include schemaLocation="../given/redefine.xsd"/>
<xsd:include schemaLocation="./candy.xsd"/>
<xsd:element name="Wrapper">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="GiftBasket" type="GiftBasket_Type" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="GiftBasket_Type">
<xsd:sequence>
<!-- From base.xsd (and apple is redefined in redefine.xsd) -->
<xsd:element name="Fruit" type="Fruit_Type" minOccurs="1" maxOccurs="1"/>
<!-- From candy.xsd -->
<xsd:element name="Candy" type="Candy_Type" minOccurs="0" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
base.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified" attributeFormDefault="unqualified">
<xsd:complexType name="Fruit_Type">
<xsd:sequence>
<xsd:element name="Apple" type="Apple_Type" minOccurs="0" maxOccurs="unbounded" />
<xsd:element name="Orange" type="Orange_Type" minOccurs="0" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
<!-- This is redefined in redefine.xsd to include additional elements -->
<xsd:complexType name="Apple_Type">
<xsd:sequence>
<xsd:element name="Size" type="xsd:string" minOccurs="0" maxOccurs="1" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="Orange_Type">
<xsd:sequence>
<xsd:element name="Size" type="xsd:string" minOccurs="0" maxOccurs="1" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
redefine.xsd:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified" attributeFormDefault="unqualified">
<xsd:redefine schemaLocation="./base.xsd">
<xsd:complexType name="Apple_Type">
<xsd:complexContent>
<xsd:extension base="Apple_Type">
<xsd:sequence>
<xsd:element name="Color" type="xsd:string"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:redefine>
</xsd:schema>
candy.xsd:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xsd:complexType name="Fruit_Type">
<xsd:choice>
<xsd:element name="Chocolate" type="xsd:string" minOccurs="0" maxOccurs="unbounded" />
<xsd:element name="Caramel" type="xsd:string" minOccurs="0" maxOccurs="unbounded" />
</xsd:choice>
</xsd:complexType>
</xsd:schema>
If you care to see my current impl of LSResourceResolver which gets me close to a solution it is shown below. If the import for candy.xsd and the referenced element is removed from wrapper.xsd and sample.xml this validates. The reason it does not work is because when candy.xsd is being loaded, the previous loaded path was in sandbox/given and the systemId passed in will be ./candy.xsd so it will look for candy.xsd in the wrong location:
package com.blarg.validation;
import java.io.InputStream;
import java.util.LinkedList;
import org.w3c.dom.ls.LSInput;
import org.w3c.dom.ls.LSResourceResolver;
import com.blarg.validation.exception.SchemaNotFoundException;
public class SchemaResolver implements LSResourceResolver {
private String path;
private ClassLoader classLoader;
private SchemaTracker tracker;
public SchemaResolver(String path, ClassLoader classLoader, SchemaTracker tracker) {
this.path = path;
this.tracker = tracker;
this.classLoader = classLoader;
}
public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
String classloaderPath = generateClassloaderResourcePath(path, systemId);
tracker.setLastLoadedSchema(classloaderPath);
InputStream is = classLoader.getResourceAsStream(classloaderPath);
if (is == null) {
throw new SchemaNotFoundException("Loading the root schema succeeded, but the following referenced schema could not be found: '"
+ classloaderPath
+ "' Make sure the root schema and referenced schemas are all in the same directory. Then verify any <xsd:include>, "
+ "<xsd:import>, or <xsd:redefine> tags all have correct 'schemaLocation' attribute values.");
}
/*
* Store the last used path so the next schema lookup is relative to it.
* This is a hack and will only work if:
* some/dir/a.xsd imports some/dir/another/b.xsd
* and some/dir/another/b.xsd imports some/dir/other/c.xsd
* etc..
*
* It will *not* work for:
* some/dir/a.xsd imports some/dir/another/b.xsd
* some/dir/another/b.xsd imports some/dir/other/c.xsd
* etc..
* AND
* some/dir/a.xsd also imports some/dir/d.xsd
*
* It will fail loading d.xsd because the last stored path
* will be /some/dir/other and the systemId coming in will
* be "./d.xsd"
*/
path = classloaderPath.substring(0, classloaderPath.lastIndexOf("/") + 1);
return new SchemaInput(publicId, systemId, is);
}
private String generateClassloaderResourcePath(String path, String systemId) {
// fullPath may contain ./ or ../ which is not allowed in classloader resource lookups.
String fullPath = path + systemId;
LinkedList<String> linkedList = new LinkedList<String>();
String current = first(fullPath);
while (current != null) {
if (".".equals(current)) {
// Do nothing, dot represents the current directory so we have it already
} else if ("..".equals(current)) {
// Remove the lastly added directory because we need to go up
linkedList.removeLast();
} else {
// The directory is just a normal directory or filename, add it
linkedList.add(current);
}
fullPath = removeFirst(fullPath);
current = first(fullPath);
}
String classLoaderPath = "";
while (linkedList.size() > 0) {
classLoaderPath = classLoaderPath + linkedList.removeFirst() + "/";
}
classLoaderPath = classLoaderPath.substring(0, classLoaderPath.length() - 1);
System.out.println("classLoaderPath: " + classLoaderPath);
System.out.println();
return classLoaderPath;
}
private String first(String path) {
if (path == null) {
return null;
} else if (path.contains("/")) {
return path.substring(0, path.indexOf("/"));
} else {
return path;
}
}
private String removeFirst(String path) {
if (path.contains("/")) {
return path.substring(path.indexOf("/") + 1);
} else {
return null;
}
}
}
You of course needs to instantiate it correctly (give it the correct path to the root schema and register it with the schemaFactory using:
schemaFactory.setResourceResolver(new SchemaResolver(pathToSchemas, classLoader, tracker));