4

I need to figure out how to validate my XML files with schema's offline. After looking around for a couple of days, what I was able to find was basically that I needed to have an internal reference to the schema. I needed to find them, download them, and change the reference to a local system path. What I was unable to find was exactly how to do that. Where and how can I change the reference to point internally instead of externally? What is the best way to download the schemas?

jschnasse
  • 8,526
  • 6
  • 32
  • 72
ErinMorgan
  • 41
  • 1
  • 2

3 Answers3

2

There are three ways you could do this. What they all have in common is that you need a local copy of the schema document(s). I'm assuming that the instance documents currently use xsi:schemaLocation and/or xsi:noNamespaceSchemaLocation to point to a location holding the schema document(s) on the web.

(a) Modify your instance documents to refer to the local copy of the schema documents. This is usually inconvenient.

(b) Redirect the references so that a request for a remote file is redirected to a local file. The way to set this up depends on which schema validator you are using and how you are invoking it.

(c) Tell the schema processor to ignore the values of xsi:schemaLocation and xsi:noNamespaceSchemaLocation, and to validate instead against a schema that you supply using your schema processor's invocation API. Again the details depend on which schema processor you are using.

My preferred approach is (c): if only because when you are validating a source document, then by definition you don't fully trust it - so why should you trust it to contain a correct xsi:schemaLocation attribute?

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
0

XmlValidate is a simple but powerful command-line tool that can perform offline validation of single or multiple XML files against target schemas. It can scan local xml files by file name, directory, or URL.

XmlValidate automatically adds the schemaLocation based on the schema namespace and a config file that mapping to a local file. The tool will validate against whatever XML Schema is referenced in the config file.

Here are example mappings of namespace to target Schema in config file:

http://www.opengis.net/kml/2.2=${XV_HOME}/schemas/kml22.xsd
http://appengine.google.com/ns/1.0=C:/xml/appengine-web.xsd
urn:oasis:names:tc:ciq:xsdschema:xAL:2.0=C:/xml/xAL.xsd

Note that ${XV_HOME} token above is simply an alias for the top-level directory that XmlValidate is running from. The location can likewise be a full file path.

XmlValidate is an open-source project (source code available) that runs with the Java Runtime Environment (JRE). The bundled application (Java jars, examples, etc.) can be downloaded here.

If XmlValidate is run in batch mode against multiple XML files, it will provide a summary of validation results.

Errors: 17  Warnings: 0  Files: 11  Time: 1506 ms
Valid files 8/11 (73%)
CodeMonkey
  • 22,825
  • 4
  • 35
  • 75
0

You can set your own Implementation of ResourceResolver and LSInput to the SchemaFactory so that the call of of LSInput.getCharacterStream() will provide a schema from a local path.

I have written an extra class to do offline validation. You can call it like

new XmlSchemaValidator().validate(xmlStream, schemaStream, "https://schema.datacite.org/meta/kernel-4.1/",
                        "schemas/datacite/kernel-4.1/");

Two InputStream are beeing passed. One for the xml, one for the schema. A baseUrl and a localPath (relative on classpath) are passed as third and fourth parameter. The last two parameters are used by the validator to lookup additional schemas locally at localPath or relative to the provided baseUrl.

I have tested with a set of schemas and examples from https://schema.datacite.org/meta/kernel-4.1/ .

Complete Example:

 @Test
 public void validate4() throws Exception {
        InputStream xmlStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(
                        "schemas/datacite/kernel-4.1/example/datacite-example-complicated-v4.1.xml");
        InputStream schemaStream = Thread.currentThread().getContextClassLoader()
                        .getResourceAsStream("schemas/datacite/kernel-4.1/metadata.xsd");
        new XmlSchemaValidator().validate(xmlStream, schemaStream, "https://schema.datacite.org/meta/kernel-4.1/",
                        "schemas/datacite/kernel-4.1/");
 }

The XmlSchemaValidator will validate the xml against the schema and will search locally for included Schemas. It uses a ResourceResolver to override the standard behaviour and to search locally.

public class XmlSchemaValidator {
    /**
     * @param xmlStream
     *            xml data as a stream
     * @param schemaStream
     *            schema as a stream
     * @param baseUri
     *            to search for relative pathes on the web
     * @param localPath
     *            to search for schemas on a local directory
     * @throws SAXException
     *             if validation fails
     * @throws IOException
     *             not further specified
     */
    public void validate(InputStream xmlStream, InputStream schemaStream, String baseUri, String localPath)
                    throws SAXException, IOException {
        Source xmlFile = new StreamSource(xmlStream);
        SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        factory.setResourceResolver((type, namespaceURI, publicId, systemId, baseURI) -> {
            LSInput input = new DOMInputImpl();
            input.setPublicId(publicId);
            input.setSystemId(systemId);
            input.setBaseURI(baseUri);
            input.setCharacterStream(new InputStreamReader(
                            getSchemaAsStream(input.getSystemId(), input.getBaseURI(), localPath)));
            return input;
        });
        Schema schema = factory.newSchema(new StreamSource(schemaStream));
        javax.xml.validation.Validator validator = schema.newValidator();
        validator.validate(xmlFile);
    }

    private InputStream getSchemaAsStream(String systemId, String baseUri, String localPath) {
        InputStream in = getSchemaFromClasspath(systemId, localPath);
        // You could just return in; , if you are sure that everything is on
        // your machine. Here I call getSchemaFromWeb as last resort.
        return in == null ? getSchemaFromWeb(baseUri, systemId) : in;
    }

    private InputStream getSchemaFromClasspath(String systemId, String localPath) {
        System.out.println("Try to get stuff from localdir: " + localPath + systemId);
        return Thread.currentThread().getContextClassLoader().getResourceAsStream(localPath + systemId);
    }

    /*
     * You can leave out the webstuff if you are sure that everything is
     * available on your machine
     */
    private InputStream getSchemaFromWeb(String baseUri, String systemId) {
        try {
            URI uri = new URI(systemId);
            if (uri.isAbsolute()) {
                System.out.println("Get stuff from web: " + systemId);
                return urlToInputStream(uri.toURL(), "text/xml");
            }
            System.out.println("Get stuff from web: Host: " + baseUri + " Path: " + systemId);
            return getSchemaRelativeToBaseUri(baseUri, systemId);
        } catch (Exception e) {
            // maybe the systemId is not a valid URI or
            // the web has nothing to offer under this address
        }
        return null;
    }

    private InputStream urlToInputStream(URL url, String accept) {
        HttpURLConnection con = null;
        InputStream inputStream = null;
        try {
            con = (HttpURLConnection) url.openConnection();
            con.setConnectTimeout(15000);
            con.setRequestProperty("User-Agent", "Name of my application.");
            con.setReadTimeout(15000);
            con.setRequestProperty("Accept", accept);
            con.connect();
            int responseCode = con.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_MOVED_PERM
                            || responseCode == HttpURLConnection.HTTP_MOVED_TEMP || responseCode == 307
                            || responseCode == 303) {
                String redirectUrl = con.getHeaderField("Location");
                try {
                    URL newUrl = new URL(redirectUrl);
                    return urlToInputStream(newUrl, accept);
                } catch (MalformedURLException e) {
                    URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                    return urlToInputStream(newUrl, accept);
                }
            }
            inputStream = con.getInputStream();
            return inputStream;
        } catch (SocketTimeoutException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

    }

    private InputStream getSchemaRelativeToBaseUri(String baseUri, String systemId) {
        try {
            URL url = new URL(baseUri + systemId);
            return urlToInputStream(url, "text/xml");
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }
}

prints

Try to get stuff from localdir: schemas/datacite/kernel-4.1/http://www.w3.org/2009/01/xml.xsd
Get stuff from web: http://www.w3.org/2009/01/xml.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-titleType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-contributorType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-dateType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-resourceType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-relationType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-relatedIdentifierType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-funderIdentifierType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-descriptionType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-nameType-v4.1.xsd

The print shows that the validator was able to validate against a set of local schemas. Only http://www.w3.org/2009/01/xml.xsd was not available locally and therefore fetched from the internet.

jschnasse
  • 8,526
  • 6
  • 32
  • 72
  • If you also want to get [http://www.w3.org/2009/01/xml.xsd](http://www.w3.org/2009/01/xml.xsd) from a local path you could use [the publicId parameter of the LSInput Interface](https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/ls/LSInput.html#getPublicId()) to query a lookup structure that associates URLs with localPathes. – jschnasse Jan 25 '18 at 17:19