3

I have some Delphi code to read and validates XML files based on an XSD document. I am using using Windows DOM (TMXLDocument). This Article explains the underlying logic.

It works on some computers (i.e. throws exception for offending tags). But on a newer computer it does not throw any exception.

Is there a setting in Windows I need to change to get it to work? Or anyone know of a native Delphi component to validate XML?

XSD File: http://www.nemsis.org/media/XSD/EMSDataSet.xsd

Sample XML (note E02_02 is required to have a positive value based on at xsd xyz.com/DataSet.xsd

<EMSDataSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.nemsis.org" xsi:schemaLocation="http://myfakedomain.com/DataSet.xsd">
<Header>
<Record>
  <E02>
    <E02_01>123</E02_01>
    <E02_02>0</E02_02>
  </E02>
</Record>
</Header>
</EMSDataSet>

Delphi Code:

XMLDoc:= TXMLDocument.Create(nil);
try
  XMLDoc.ParseOptions:= [poResolveExternals, poValidateOnParse];
  XMLDoc.LoadFromFile(filetocheck);
  XMLDoc.Active:= True;
except
  on E:EDOMParseError do begin
    showMessage(e.Message);
  end;
end;    

Exception:

The element: '{http://www.nemsis.org}E02_02'  has an invalid value according to its data type.  Line: 20  <E02_02>0</E02_02>
M Schenkel
  • 6,294
  • 12
  • 62
  • 107
  • 3
    What exception are you expecting it to throw, and why are you expecting it? Have you tried catching `Exception` instead of `EDOMParseError`? Can you show an example XML that is not failing how you are expecting it to fail? Also, what type is `XMLDoc` declared as - `TXMLDocument` or `IXMLDocument`? Are you aware that `IXMLDocument` is *required* when creating `TXMLDocument` with a `nil` Owner? – Remy Lebeau Jun 04 '15 at 21:07
  • 3
    BTW, setting `Active:= True` is redundant in this example because `Active` is already True when `LoadFromFile()` exits without error. – Remy Lebeau Jun 04 '15 at 21:12
  • There's not a lot to go on here. Perhaps you could provide the XML. – David Heffernan Jun 05 '15 at 07:26
  • @RemyLebeau - I provided more information. NOTE - it really isn't about poorly formatted XML. This is about the XML not conforming to the rules set in the XSD file. – M Schenkel Jun 05 '15 at 13:39
  • @DavidHeffernan Added sample XML and clarified question. NOTE: it is not that the XML is poorly formatted. This is more about the contents of tags not conforming to the rules specified in the XSD. – M Schenkel Jun 05 '15 at 13:40
  • Can you show the actual XSD, or at least its `` declaration? MSXML has limitations on whether it will *automatically* invoke an external schema while loading an XML document. – Remy Lebeau Jun 05 '15 at 16:23
  • @RemyLebeau Here it is: http://www.nemsis.org/media/XSD/EMSDataSet.xsd As I say, it actually validates on one computer. But on another computer it doesn't even seem to make an attempt (I don't think it is making a call to download the schema file). – M Schenkel Jun 08 '15 at 13:43
  • @RemyLebeau - I am trying modifications based on your answer now. will keep you posted. – M Schenkel Jun 08 '15 at 13:54

2 Answers2

4

TXMLDocument does not directly support enabling XSD validation when using MSXML, so it is MSXML's responsibility to manage it. Enabling the poResolveExternals and poValidateOnParse flags is important for that, but there are some other factors to consider. Most importantly, although MSXML does support referencing an XSD from within the XML, it has some limitations on whether the referenced XSD will actually be used while loading the XML:

Referencing XSD Schemas in Documents

To reference an XML Schema (XSD) schema from an XML document in MSXML 6.0, you can use any one of the following means to link a schema to an XML document so that MSXML will use the schema to validate the document contents.

  • Reference the XSD schema in the XML document using XML schema instance attributes such as either xsi:schemaLocation or xsi:noNamespaceSchemaLocation.

  • Add the XSD schema file to a schema cache and then connect that cache to the DOM document or SAX reader, prior to loading or parsing the XML document.

...

The xsi:schemaLocation attribute works well in situations where namespace prefixes are explicitly declared and used in the XML document you want to validate.

The following example shows an XML document that references an external XSD schema, MyData.xsd for us in validating nodes that are in the 'urn:MyData' namespace URI , which is mapped to the "MyData:" namespace prefix.

<catalog xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
  xsi:schemaLocation="urn:MyData http://www.example.com/MyData.xsd"
  <MyData:book xmlns:MyData="urn:MyData">
     <MyData:title>Presenting XML</MyData:title>
     <MyData:author>Richard Light</MyData:author>
  </MyData:book>

In order for the MyData.xsd file to be paired with and used you to validate elements and attribute nodes that start with the "MyData:", the schema needs to use and contain the following schema attributes:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:MyData="urn:MyData"
        targetNamespace="urn:MyData"
        elementFormDefault="qualified">

These attributes declare the 'urn:MyData' namespace URI and the "MyData:" namespace prefix so that they correspond identically to how these declarations were made in the XML file. If they do not match, the schema at the specified location would never be invoked during validation.

You have not shown your XSD yet, but the XML you have shown does not conform to the rules mentioned in the above documentation. In particular, you are missing the use of a urn namespace mapping, and prefixes on the XML nodes that you want to validate. Some versions of MSXML might handle this better than others, which could explain why the validation works on some machines and is ignored on other machines, depending on which versions of MSXML are installed.

That being said, you may have to resort to the second approach mentioned in the documentation:

  • Add the XSD schema file to a schema cache and then connect that cache to the DOM document or SAX reader, prior to loading or parsing the XML document.

That requires using MSXML directly, you can't do it with TXMLDocument:

MSXML also provides a means to connect and use a schema cache to store, load and connect a schema to an XML document, such as in the following VBScript code excerpt:

'Create the schema cache and add the XSD schema to it.
set oSC = CreateObject("MSXML2.XMLSchemaCache.6.0")
oSC.Add "urn:MyData", "http://www.example.com/MyData.xsd"
'Create the DOM document assign the cache to its schemas property.
set oXD = CreateObject("MSXML2.DOMDocument.6.0")
oXD.schemas = oSC
'Set properties, load and validate it in the XML DOM.

The gotcha is that you have to know where the XSD is located in order to hook it up to the parser. So, you would have to load the XML once just to extract the XSD location, then load the XSD into an schema cache, and then re-load the XML with the XSD attached. Here are some Delphi examples of that:

schema validation with msxml in delphi

function TForm1.ValidXML2(const xmlFile: String;
  out err: IXMLDOMParseError): Boolean;
var
  xml, xml2, xsd: IXMLDOMDocument2;
  schemas, cache: IXMLDOMSchemaCollection;
begin
  xml := CoDOMDocument.Create;
  if xml.load(xmlFile) then
  begin
    schemas := xml.namespaces;
    if schemas.length > 0 then
    begin
      xsd := CoDOMDocument40.Create;
      xsd.Async := False;
      xsd.load(schemas.namespaceURI[0]);
      cache := CoXMLSchemaCache40.Create;
      cache.add(schemas.namespaceURI[1], xsd);
      xml2 := CoDOMDocument40.Create;
      xml2.async := False;
      xml2.schemas := cache;
      Result := xml2.load(xmlFile);
      //err := xml.validate;
      if not Result then
        err := xml2.parseError
      else
        err := nil;
    end;
  end;
end;

How to validate a IXMLDocument against a XML Schema?

unit XMLValidate;

// Requirements ----------------------------------------------------------------
//
// MSXML 4.0 Service Pack 1
// http://www.microsoft.com/downloads/release.asp?releaseid=37176
//
// -----------------------------------------------------------------------------

interface

uses
  SysUtils, XMLIntf, xmldom, XMLSchema;

type
  EValidateXMLError = class(Exception)
  private
    FErrorCode: Integer;
    FReason: string;
  public
    constructor Create(AErrorCode: Integer; const AReason: string);
    property ErrorCode: Integer read FErrorCode;
    property Reason: string read FReason;
  end;

procedure ValidateXMLDoc(const Doc: IDOMDocument; const SchemaLocation, SchemaNS: WideString); overload;
procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const SchemaLocation, SchemaNS: WideString); overload;
procedure ValidateXMLDoc(const Doc: IDOMDocument; const Schema: IXMLSchemaDoc); overload;
procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const Schema: IXMLSchemaDoc); overload;

implementation

uses
  Windows, ComObj, msxmldom, MSXML2_TLB;

resourcestring
  RsValidateError = 'Validate XML Error (%.8x), Reason: %s';

{ EValidateXMLError }

constructor EValidateXMLError.Create(AErrorCode: Integer; const AReason: string);
begin
  inherited CreateResFmt(@RsValidateError, [AErrorCode, AReason]);
  FErrorCode := AErrorCode;
  FReason := AReason;
end;

{ Utility routines }

function DOMToMSDom(const Doc: IDOMDocument): IXMLDOMDocument2;
begin
  Result := ((Doc as IXMLDOMNodeRef).GetXMLDOMNode as IXMLDOMDocument2);
end;

function LoadMSDom(const FileName: WideString): IXMLDOMDocument2;
begin
  Result := CoDOMDocument40.Create;
  Result.async := False;
  Result.resolveExternals := True; //False;
  Result.validateOnParse := True;
  Result.load(FileName);
end;

{ Validate }

procedure InternalValidateXMLDoc(const Doc: IDOMDocument; const SchemaDoc: IXMLDOMDocument2; const SchemaNS: WideString);
var
  MsxmlDoc: IXMLDOMDocument2;
  SchemaCache: IXMLDOMSchemaCollection;
  Error: IXMLDOMParseError;
begin
  MsxmlDoc := DOMToMSDom(Doc);
  SchemaCache := CoXMLSchemaCache40.Create;
  SchemaCache.add(SchemaNS, SchemaDoc);
  MsxmlDoc.schemas := SchemaCache;
  Error := MsxmlDoc.validate;
  if Error.errorCode <> S_OK then
    raise EValidateXMLError.Create(Error.errorCode, Error.reason);
end;

procedure ValidateXMLDoc(const Doc: IDOMDocument; const SchemaLocation, SchemaNS: WideString);
begin
  InternalValidateXMLDoc(Doc, LoadMSDom(SchemaLocation), SchemaNS);
end;

procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const SchemaLocation, SchemaNS: WideString);
begin
  InternalValidateXMLDoc(Doc.DOMDocument, LoadMSDom(SchemaLocation), SchemaNS);
end;

procedure ValidateXMLDoc(const Doc: IDOMDocument; const Schema: IXMLSchemaDoc);
begin
  InternalValidateXMLDoc(Doc, DOMToMSDom(Schema.DOMDocument), '');
end;

procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const Schema: IXMLSchemaDoc);
begin
  InternalValidateXMLDoc(Doc.DOMDocument, DOMToMSDom(Schema.DOMDocument), '');
end;

end.

Doc := LoadXMLData(XmlFileEdit.Lines.Text);
ValidateXMLDoc(Doc, FSchemaFileName, 'http://www.foo.com');

XML Documents, Schemas and Validation

var
  XML, XSDL: Variant;
begin
  XSDL := CreateOLEObject('MSXML2.XMLSchemaCache.4.0');
  XSDL.validateOnLoad := True;
  XSDL.add('','MySchema.xsd'); // 1st argument is target namespace
  ShowMessage('Schema Loaded');
  XML := CreateOLEObject('MSXML2.DOMDocument.4.0');
  XML.validateOnParse := True;
  XML.resolveExternals := True;
  XML.schemas := XSDL;
  XML.load('file.xml');
  ShowMessage(XML.parseError.reason);
end.
Community
  • 1
  • 1
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Trying the **schema validation with msxml in delphi** implementation. I had to modify **xml := CoDOMDocument.Create;** to **schema validation with msxml in delphi**. When I run I get "Class Not Registered". I have registered **msxml6.dll** on the computer. – M Schenkel Jun 08 '15 at 15:04
0

I know this question is tagged for Delphi, but I thought some Embarcadero C++ Builder users might benefit from seeing a C++ implementation of Remy's last example using MSXML2 OLE objects.

I know I wish someone would have posted this a few days ago. XD

.h file:

//------------------------------------------------------------------------------
#ifndef XmlValidatorUH
#define XmlValidatorUH
//------------------------------------------------------------------------------
class PACKAGE TXmlValidator
{
private:
    Variant FSchemaCache;
    Variant FXmlDomDoc;

    // TAutoCmd Variables
    Procedure   CacheProcAdd;
    PropertySet CacheSetValidateOnLoad;

    Procedure   XmlProcLoadXml;
    PropertySet XmlSetValidateOnParse;
    PropertySet XmlSetResolveExternals;
    PropertySet XmlSetSchemas;
    PropertyGet XmlGetParseError;

    PropertyGet ParseErrorGetReason;

public:
    __fastcall TXmlValidator( String _SchemaLocation );

    String __fastcall ValidationError( String _Xml );

};
//------------------------------------------------------------------------------

#endif

.cpp file:

//------------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
//------------------------------------------------------------------------------
#include "XmlValidatorU.h"
#include <System.Win.ComObj.hpp>
//------------------------------------------------------------------------------
#pragma package(smart_init)
//------------------------------------------------------------------------------
// Validates XML against Schema
//------------------------------------------------------------------------------
// This class uses OLE objects from MSXML2 to validate XML from an XSD file.
// Generally, use the following steps to deal with OLE objects:
//  1. Define a Variant variable for your OLE Object; assign using CreateOleObject().
//  2. Define your TAutoCmd objects that will be used in Variant.Exec()
//  3. Set TAutoCmd args using << to add settings
//  4. Once everything is set up, call Exec() on your OLE Object variant
// More documentation on OLE objects / TAutoCmd at:
//  http://docwiki.embarcadero.com/CodeExamples/Rio/en/AutoCmd_(C%2B%2B)
//------------------------------------------------------------------------------
// This macro clarifies that we're registering OLE Function names to our defined TAutoCmd variables.
//
#define RegisterAutoCmd( _AutoCmd, _OleFunc ) _AutoCmd( _OleFunc )
//------------------------------------------------------------------------------
// These macros clear AutoCmdArgs before setting them.
// I made these because setting an arg multiple times just stacks them up, changing the function signature.
// Then, OLE throws a "Member Not Found" error because it can't find a function with that signature.
//
#define AutoCmdArg( _AutoCmd, _Arg ) _AutoCmd.ClearArgs(); _AutoCmd << _Arg
#define AutoCmdArgs( _AutoCmd, _Arg1, _Arg2 ) AutoCmdArg( _AutoCmd, _Arg1 ); _AutoCmd << _Arg2
//------------------------------------------------------------------------------
__fastcall TXmlValidator::TXmlValidator( String _SchemaLocation )
    :
    RegisterAutoCmd( CacheProcAdd,              "add"               ),
    RegisterAutoCmd( CacheSetValidateOnLoad,    "validateOnLoad"    ),
    RegisterAutoCmd( XmlProcLoadXml,            "loadXML"           ),
    RegisterAutoCmd( XmlSetValidateOnParse,     "validateOnParse"   ),
    RegisterAutoCmd( XmlSetResolveExternals,    "resolveExternals"  ),
    RegisterAutoCmd( XmlSetSchemas,             "schemas"           ),
    RegisterAutoCmd( XmlGetParseError,          "parseError"        ),
    RegisterAutoCmd( ParseErrorGetReason,       "reason"            )
{
    if ( _SchemaLocation.IsEmpty() ) 
    { 
        throw Exception( String( __FUNC__ ) + " - Missing Schema Location" );
    }

    // Instantiate the OLE objects
    FSchemaCache    = CreateOleObject( "MSXML2.XMLSchemaCache.4.0"  );
    FXmlDomDoc      = CreateOleObject( "MSXML2.DOMDocument.4.0"     );

    // Set static args that shouldn't change
    AutoCmdArg( CacheSetValidateOnLoad, true );
    AutoCmdArg( XmlSetValidateOnParse,  true );
    AutoCmdArg( XmlSetResolveExternals, true );

    const AnsiString NoNameSpace = "";
    AutoCmdArgs( CacheProcAdd, NoNameSpace, AnsiString( _SchemaLocation ) );

    // Load Cache
    FSchemaCache.Exec( CacheSetValidateOnLoad   );  // Validate on Load
    FSchemaCache.Exec( CacheProcAdd             );  // Add Schema file location to the cache

    // Now that the cache is loaded, set cached schema as arg to XML
    AutoCmdArg( XmlSetSchemas, FSchemaCache );
}
//------------------------------------------------------------------------------
String __fastcall TXmlValidator::ValidationError( String _Xml )
{
    AutoCmdArg( XmlProcLoadXml, AnsiString( _Xml ) );

    FXmlDomDoc.Exec( XmlSetValidateOnParse  );
    FXmlDomDoc.Exec( XmlSetResolveExternals );
    FXmlDomDoc.Exec( XmlSetSchemas          );
    FXmlDomDoc.Exec( XmlProcLoadXml         );

    Variant ParseErr = FXmlDomDoc.Exec( XmlGetParseError );

    return ParseErr.Exec( ParseErrorGetReason );
}
//------------------------------------------------------------------------------
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
luaphacim
  • 348
  • 2
  • 8