2

I am trying to create a Saxon XPathCompiler. I have the same code in Java & .NET, each calling the appropriate Saxon library. The code is:

protected void ctor(InputStream xmlData, InputStream schemaFile, boolean preserveWhiteSpace) throws SAXException, SchemaException, SaxonApiException {
    this.rootNode = makeDataSourceNode(null);
    XMLReader reader = XMLReaderFactory.createXMLReader();

    InputSource xmlSource = new InputSource(xmlData);
    SAXSource saxSource = new SAXSource(reader, xmlSource);
    Source schemaSource = new StreamSource(schemaFile);
    Configuration config = createEnterpriseConfiguration();
    config.addSchemaSource(schemaSource);
    // ...

In the case of .NET the InputStreams are a class that wrpas a .NET Stream and makes it a Java InputStream. For Java the above code works fine. But in .NET, the last line, config.addSchemaSource(schemaSource) throws:

$exception {"Content is not allowed in prolog."} org.xml.sax.SAXParseException

In both Java & .NET it works fine if there is no schema.

The files it is using are http://www.thielen.com/test/SouthWind.xml & http://www.thielen.com/test/SouthWind.xsd

It does not appear to be any of the issues in this question. And if that was the issue, shouldn't both Java and .NET have the same problem.

I'm thinking maybe it's the wrapper around the .NET Stream to make it a Java InputStream, but we use that class everywhere without any other issues.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
David Thielen
  • 28,723
  • 34
  • 119
  • 193

1 Answers1

1

The "content is not allowed in Prolog" exception is absolutely infuriating - if only it told you what the bytes are that it is complaining about! One diagnostic technique is to display the initial bytes delivered by the InputStream: do a few calls on

System.err.println(schemaFile.next())

My first guess as to the cause would be something to do with byte order marks, but rather than speculate, I would focus on diagnostics to see what the parser is seeing in that InputStream that it doesn't like.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • I fixed it by explicitly setting both files to utf-8 encoding (previously the xml file had no spec and the xsd was utf-16). But encoding should not make a difference... – David Thielen Jun 26 '18 at 01:20
  • Hey, you wrote Saxon. Can't you give us more info on the exception? thanks - dave – David Thielen Jun 26 '18 at 01:42
  • Saxon calls an external XML parser to do the parsing: by default it uses the XML parser in the JDK. Sometimes you can get better diagnostics by configuring Saxon to use a different parser (but Xerces has pretty well cornered the market these days). – Michael Kay Jun 26 '18 at 12:02
  • Oh... Ok, now this all makes sense. – David Thielen Jun 26 '18 at 14:57