37

I'm trying to validate an XML file against a number of different schemas (apologies for the contrived example):

  • a.xsd
  • b.xsd
  • c.xsd

c.xsd in particular imports b.xsd and b.xsd imports a.xsd, using:

<xs:include schemaLocation="b.xsd"/>

I'm trying to do this via Xerces in the following manner:

XMLSchemaFactory xmlSchemaFactory = new XMLSchemaFactory();
Schema schema = xmlSchemaFactory.newSchema(new StreamSource[] { new StreamSource(this.getClass().getResourceAsStream("a.xsd"), "a.xsd"),
                                                         new StreamSource(this.getClass().getResourceAsStream("b.xsd"), "b.xsd"),
                                                         new StreamSource(this.getClass().getResourceAsStream("c.xsd"), "c.xsd")});     
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new StringReader(xmlContent)));

but this is failing to import all three of the schemas correctly resulting in cannot resolve the name 'blah' to a(n) 'group' component.

I've validated this successfully using Python, but having real problems with Java 6.0 and Xerces 2.8.1. Can anybody suggest what's going wrong here, or an easier approach to validate my XML documents?

Jonathan Holloway
  • 62,090
  • 32
  • 125
  • 150

8 Answers8

18

So just in case anybody else runs into the same issue here, I needed to load a parent schema (and implicit child schemas) from a unit test - as a resource - to validate an XML String. I used the Xerces XMLSchemFactory to do this along with the Java 6 validator.

In order to load the child schema's correctly via an include I had to write a custom resource resolver. Code can be found here:

https://code.google.com/p/xmlsanity/source/browse/src/com/arc90/xmlsanity/validation/ResourceResolver.java

To use the resolver specify it on the schema factory:

xmlSchemaFactory.setResourceResolver(new ResourceResolver());

and it will use it to resolve your resources via the classpath (in my case from src/main/resources). Any comments are welcome on this...

Jonathan Holloway
  • 62,090
  • 32
  • 125
  • 150
  • 4
    Any chance of elaborating on this a bit further as to how the custom resource resolver makes this all work? Thanks. – Casey Nov 11 '09 at 20:22
  • I can add that you have to add something like this: ` ` in parent xsd loaded with `new StreamSource(this.getClass().getResourceAsStream("parent.xsd")` – Jaime Hablutzel Jul 23 '12 at 23:44
  • 1
    Did you create an "artificial" parent schema that imported all the other ones? – zedoo Jul 22 '13 at 14:02
  • 2
    The link isn't working any more but you I found the code example here: http://code.google.com/p/xmlsanity/source/browse/src/com/arc90/xmlsanity/validation/ResourceResolver.java?r=03a92d97f15904b3892922e45724bb086d54fa4e. – Tom Saleeba Feb 19 '15 at 05:38
  • 2
    As far as I can see the code is here now: https://github.com/arc90/xmlsanity/blob/master/src/main/java/com/arc90/xmlsanity/util/ClassBasedResourceResolver.java – beat Jan 17 '17 at 14:42
  • Google code is dead. Please transfer your code elsewhere. – Holger Jakobs Apr 05 '17 at 21:17
7

http://www.kdgregory.com/index.php?page=xml.parsing section 'Multiple schemas for a single document'

My solution based on that document:

URL xsdUrlA = this.getClass().getResource("a.xsd");
URL xsdUrlB = this.getClass().getResource("b.xsd");
URL xsdUrlC = this.getClass().getResource("c.xsd");

SchemaFactory schemaFactory = schemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//---
String W3C_XSD_TOP_ELEMENT =
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n"
   + "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" elementFormDefault=\"qualified\">\n"
   + "<xs:include schemaLocation=\"" +xsdUrlA.getPath() +"\"/>\n"
   + "<xs:include schemaLocation=\"" +xsdUrlB.getPath() +"\"/>\n"
   + "<xs:include schemaLocation=\"" +xsdUrlC.getPath() +"\"/>\n"
   +"</xs:schema>";
Schema schema = schemaFactory.newSchema(new StreamSource(new StringReader(W3C_XSD_TOP_ELEMENT), "xsdTop"));
iolha
  • 71
  • 1
  • 1
2

The schema stuff in Xerces is (a) very, very pedantic, and (b) gives utterly useless error messages when it doesn't like what it finds. It's a frustrating combination.

The schema stuff in python may be a lot more forgiving, and was letting small errors in the schema go past unreported.

Now if, as you say, c.xsd includes b.xsd, and b.xsd includes a.xsd, then there's no need to load all three into the schema factory. Not only is it unnecessary, it will likely confuse Xerces and result in errors, so this may be your problem. Just pass c.xsd to the factory, and let it resolve b.xsd and a.xsd itself, which it should do relative to c.xsd.

skaffman
  • 398,947
  • 96
  • 818
  • 769
  • Yeah this seems to result in the same error too. I'm wondering whether the import declarations in the schema files are causing issues... It doesn't help that two of the schemas have no target namespace either... gargh – Jonathan Holloway Jul 07 '09 at 22:47
  • Maybe one of the ways to resolve this is to use a ResourceResolevr and set it on the schema factory... – Jonathan Holloway Jul 07 '09 at 23:01
  • 4
    Are you sure you're not mixing up import and include? They mean two different things, and shouldn't be confused. Are a, b and c in different namespaces? If so, then they should be imported, not included. If they're in the same namespace, they should be included. – skaffman Jul 08 '09 at 07:01
  • 1
    I've not written the schema as such nor can i change them, include is used - they are in different namespaces - not quite sure why. I had to write a custom resolver and import the root schema to get this to work in the end... but thanks for the pointer on loading the root schema anyways... – Jonathan Holloway Jul 08 '09 at 23:17
  • 1
    @skaffman I learned that the order of XSD can be significant. For instance I have 2 xsd files a.xsd and b.xsd. In my xml file firstly the namespace which belongs to the a.xsd is used and the next namespace belongs to the b.xsd. So I have to validate xml file against a.xsd,b.xsd (not b.xsd,a.xsd) But I detected this manually. How can I automatically detect it? – limonik Aug 01 '16 at 14:16
2

I faced the same problem and after investigating found this solution. It works for me.

Enum to setup the different XSDs:

public enum XsdFile {
    // @formatter:off
    A("a.xsd"),
    B("b.xsd"),
    C("c.xsd");
    // @formatter:on

    private final String value;

    private XsdFile(String value) {
        this.value = value;
    }

    public String getValue() {
        return this.value;
    }
}

Method to validate:

public static void validateXmlAgainstManyXsds() {
    final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

    String xmlFile;
    xmlFile = "example.xml";

    // Use of Enum class in order to get the different XSDs
    Source[] sources = new Source[XsdFile.class.getEnumConstants().length];
    for (XsdFile xsdFile : XsdFile.class.getEnumConstants()) {
        sources[xsdFile.ordinal()] = new StreamSource(xsdFile.getValue());
    }

    try {
        final Schema schema = schemaFactory.newSchema(sources);
        final Validator validator = schema.newValidator();
        System.out.println("Validating " + xmlFile + " against XSDs " + Arrays.toString(sources));
        validator.validate(new StreamSource(new File(xmlFile)));
    } catch (Exception exception) {
        System.out.println("ERROR: Unable to validate " + xmlFile + " against XSDs " + Arrays.toString(sources)
                + " - " + exception);
    }
    System.out.println("Validation process completed.");
}
Weslor
  • 22,180
  • 2
  • 20
  • 31
2

From the xerces documentation : http://xerces.apache.org/xerces2-j/faq-xs.html

import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

...

StreamSource[] schemaDocuments = /* created by your application */;
Source instanceDocument = /* created by your application */;

SchemaFactory sf = SchemaFactory.newInstance(
    "http://www.w3.org/XML/XMLSchema/v1.1");
Schema s = sf.newSchema(schemaDocuments);
Validator v = s.newValidator();
v.validate(instanceDocument);
Hesse
  • 37
  • 1
1

I ended up using this:

import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;
import java.io.IOException;
 .
 .
 .
 try {
        SAXParser parser = new SAXParser();
        parser.setFeature("http://xml.org/sax/features/validation", true);
        parser.setFeature("http://apache.org/xml/features/validation/schema", true);
        parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking", true);
        parser.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", "http://your_url_schema_location");

        Validator handler = new Validator();
        parser.setErrorHandler(handler);
        parser.parse("file:///" + "/home/user/myfile.xml");

 } catch (SAXException e) {
    e.printStackTrace();
 } catch (IOException ex) {
    e.printStackTrace();
 }


class Validator extends DefaultHandler {
    public boolean validationError = false;
    public SAXParseException saxParseException = null;

    public void error(SAXParseException exception)
            throws SAXException {
        validationError = true;
        saxParseException = exception;
    }

    public void fatalError(SAXParseException exception)
            throws SAXException {
        validationError = true;
        saxParseException = exception;
    }

    public void warning(SAXParseException exception)
            throws SAXException {
    }
}

Remember to change:

1) The parameter "http://your_url_schema_location" for you xsd file location.

2) The string "/home/user/myfile.xml" for the one pointing to your xml file.

I didn't have to set the variable: -Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema=org.apache.xerces.jaxp.validation.XMLSchemaFactory

Baz
  • 36,440
  • 11
  • 68
  • 94
Edenshaw
  • 1,692
  • 18
  • 27
1

Just in case, anybody still come here to find the solution for validating xml or object against multiple XSDs, I am mentioning it here

//Using **URL** is the most important here. With URL, the relative paths are resolved for include, import inside the xsd file. Just get the parent level xsd here (not all included xsds).

URL xsdUrl = getClass().getClassLoader().getResource("my/parent/schema.xsd");

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(xsdUrl);

JAXBContext jaxbContext = JAXBContext.newInstance(MyClass.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(schema);

/* If you need to validate object against xsd, uncomment this
ObjectFactory objectFactory = new ObjectFactory();
JAXBElement<MyClass> wrappedObject = objectFactory.createMyClassObject(myClassObject); 
marshaller.marshal(wrappedShipmentMessage, new DefaultHandler());
*/

unmarshaller.unmarshal(getClass().getClassLoader().getResource("your/xml/file.xml"));
Shafiul
  • 1,452
  • 14
  • 21
0

If all XSDs belong to the same namespace then create a new XSD and import other XSDs into it. Then in java create schema with the new XSD.

Schema schema = xmlSchemaFactory.newSchema(
    new StreamSource(this.getClass().getResourceAsStream("/path/to/all_in_one.xsd"));

all_in_one.xsd :

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:ex="http://example.org/schema/" 
 targetNamespace="http://example.org/schema/" 
 elementFormDefault="unqualified"
 attributeFormDefault="unqualified">

    <xs:include schemaLocation="relative/path/to/a.xsd"></xs:include>
    <xs:include schemaLocation="relative/path/to/b.xsd"></xs:include>
    <xs:include schemaLocation="relative/path/to/c.xsd"></xs:include>

</xs:schema>
Dojo
  • 5,374
  • 4
  • 49
  • 79