2

I'm using a SAX parser with a custom handler to parse some XML files. This works well so far, but I want to check more than only the well-formedness of the given file and use validation via an XSD Scheme, which also contains default values for optional attributes. There are lots of tutorials online on doing this, but I was not able to find a way that satisfies all my constraints, which are as follows:

-I don't know the scheme beforehand, I have a bunch of XML and XSD files and every XML contains information about the XSD it should conform to

-The validatior should alter the stream the handler gets and insert the default values for optional attributes from the XSD if necessary

-The current custom handler should be used

I'm fairly new to this topic, so I can't preclude that I've stumbled over the solution without beeing aware of it, but I'm currently completely confused on how to do this.

Here is a minimum SSCCE, which should show the problem and related parts:

package parserTest;

import java.io.File;
import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.validation.TypeInfoProvider;
import javax.xml.validation.ValidatorHandler;

import org.w3c.dom.ls.LSResourceResolver;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.ErrorHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class ParserTest
{
  public final static void main(String[] args)
  {
    //Initialize SAX parser
    final SAXParserFactory saxFactory = SAXParserFactory.newInstance();
    SAXParser saxParser = null;
    try
    {
      saxParser = saxFactory.newSAXParser();
    }
    catch(ParserConfigurationException confEx){confEx.printStackTrace();}
    catch (SAXException saxEx){saxEx.printStackTrace();}

    //Initialize Handler
    DefaultHandler saxHandler = new CustomHandler();

    ValidatorHandler vh = new ValidatorHandler()
    {
      @Override
      public void startPrefixMapping(String prefix, String uri) throws SAXException{}

      @Override
      public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException{}

      @Override
      public void startDocument() throws SAXException{}

      @Override
      public void skippedEntity(String name) throws SAXException{}

      @Override
      public void setDocumentLocator(Locator locator){}

      @Override
      public void processingInstruction(String target, String data) throws SAXException{}

      @Override
      public void ignorableWhitespace(char[] ch, int start, int length)throws SAXException{}

      @Override
      public void endPrefixMapping(String prefix) throws SAXException{}

      @Override
      public void endElement(String uri, String localName, String qName) throws SAXException{}

      @Override
      public void endDocument() throws SAXException{}

      @Override
      public void characters(char[] ch, int start, int length) throws SAXException{}

      @Override
      public void setResourceResolver(LSResourceResolver resourceResolver){}

      @Override
      public void setErrorHandler(ErrorHandler errorHandler){}

      @Override
      public void setContentHandler(ContentHandler receiver){}

      @Override
      public TypeInfoProvider getTypeInfoProvider(){return null;}

      @Override
      public LSResourceResolver getResourceResolver(){return null;}

      @Override
      public ErrorHandler getErrorHandler(){return null;}

      @Override
      public ContentHandler getContentHandler(){return null;}
    };

    vh.setContentHandler(saxHandler);

    //Do the parsing
    File input = new File("");
    try
    {
      saxParser.parse(input, saxHandler);
      //saxParser.parse(input, vh);       //<-- First attempt, gives me error message
      //saxParser.setContentHandler(vh);  //<-- Second attempt, but my parser does not seem to know this method
    }
    catch (IOException ioEx){ioEx.printStackTrace();}
    catch (SAXException saxEx){saxEx.printStackTrace();}
  }

  /*
   * This class is the handler to be used only by this class.
   */
  static private final class CustomHandler extends DefaultHandler
  {
    //Handle start of element
    public final void startElement(String namespaceURI, String localName, String qName, Attributes atts){}

    //Handle end of Element
    public final void endElement(String namespaceURI, String localName, String qName){}

    //Handle start of characters
    public final void characters(char[] ch, int start, int length){}
  }
}
Wanderer
  • 272
  • 5
  • 15

1 Answers1

2

The basic principle is to insert a ValidatorHandler between the SAX parser and your ContentHandler

https://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/ValidatorHandler.html

ValidatorHandler vh = new ValidatorHandler();
vh.setContentHandler(originalContentHandler);
parser.setContentHandler(vh);

The tricky bit is that in order to create a ValidatorHandler, you need to know what schema is in use. How is it identified? If it uses the xsi:schemaLocation attribute, then you can (probably) get the ValidatorHandler to pick it up automatically. If it uses some custom mechanism, you may have to do a "prepass" reading (some of) the source file to discover the schema, then reading it again with the ValidatorHandler in place.

Your ContentHandler will be notified of default values for optional attributes.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thank you for pointing out the basic principle. Unfortunately, I'm still having problems with my implementation, since my parser does not accept my validator as content handler. I'm using the javax.xml.parsers.SAXParserFactory to generate a javax.xml.parsers.SAXParser which uses a org.xml.sax.helpers.DefaultHandler and now added a javax.xml.validation.ValidatorHandler, but this does not seem to work. – Wanderer Jul 20 '15 at 13:59
  • Sorry, but there's not enough information there to debug your code. – Michael Kay Jul 20 '15 at 21:13
  • You're right, the example code was bad. I tried to make a minimum SSCCE now, which should show the problem and offer the necessary information. – Wanderer Jul 21 '15 at 17:37
  • 1
    A ValidatorHandler is supposed to do validation. You've written your own ValidatorHandler which doesn't do anything at all. You should be using a ValidatorHandler (eg. the one in Xerces or the one in Saxon) which does real validation against a schema. You get this by creating a SchemaFactory, calling its newSchema() method to compile a schema, and then calling Schema.newValidatorHandler() to create the ValidatorHandler. – Michael Kay Jul 22 '15 at 13:30
  • Okay, two more questions: 1. I know that the ValidatorHandler I postet in the SSCCE is just a dummy, but even the way you described in your last comment I still get the error I described in my SSCCE (the part written in comments). 2. How do I specify, which handler I want to use? I thought it should be an argument for newValidatorHandler(), but this seems not to be possible. – Wanderer Aug 11 '15 at 09:42
  • You give the system a schema, you then call newValidatorHandler() and it gives you back a validator that you can use to validate documents against that schema. Sorry you're having such difficulty with this concept. – Michael Kay Aug 11 '15 at 17:33