General XSD to check if XML is well-formed?

Question

We have a system where XML files are imported, checked against an xsd and then processed.

Now we have a case where we want to transfer any object this way. I.e. A Java object is serialized into XML and later on imported, checked against an xsd and processed.

As we do not know beforehand how exactly the object will look like, we want to use an xsd that is extremely generic and only checks on the XML format being well formed at all but not for specific nodes or so.

I tried finding such a general XSD, but all I found was websites who would check for well formedness for you, while I need an xsd that does a similar check.

Does anyone know of such an XSD? Or how can I create it. Ideally it would say:

"XML has a header and a data area. Header area is structured content, I know how to describe that part. Data area can be anything. I don't care what it is, I simply accept it if it is XML."

If I cannot find an xsd that accepts such unspecific content, I would revert to skipping the xsd validation in this case, but that would be an awkward solution as I'd have to change a well established general import function that I hope I don't need to touch.

score 2 · Answer 1 · answered Jan 29 '16 at 02:23

Yes, you can check well-formedness, or the validity of just part of a document, with an XSD validator.

As others have pointed out, if you really want to check merely for well-formedness, you don't need an XSD validation step at all.

But it says here they are wrong to say you cannot use an XSD validation step to check for well-formedness: all you need is an essentially vacuous schema and a validator that you can invoke in 'lax validation' mode (which essentially validates elements and attributes against matching declarations -- in a vacuous schema, none will be found). Since any normal XSD validator will parse the XML if you hand it XML (instead of, say, a DOM object), well-formedness will be checked as a side effect. (It's possible to argue, of course, that in that case the well-formedness checking is not really part of the XSD validation process, just a necessary accompaniment to it. I am among the people who enjoy such casuistry; I don't have the impression that you care about such distinctions.)

But in fact you say you know how to describe (and thus, I suppose, to validate) the header area, and it's just the payload area that should be unconstrained. For that you want a schema of approximately the following form. It's very similar in basic idea to the sketch provided by asmith1024, except that that schema uses an explicit wildcard for its objects element, while this one just relies on the default type of xs:anyType; one consequence is that this one's tns:payload element will accept character data as content, while the objects element won't.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema 
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  elementFormDefault="qualified"
  targetNamespace="http://example.com/nss/target"
  xmlns:tns="http://example.com/nss/target"> 

  <!--* a message contains a header and 
      * a payload. *-->
  <xs:complexType name="message">
    <xs:sequence>
      <xs:element ref="tns:header"/>
      <xs:element ref="tns:payload"/>
    </xs:sequence>
  </xs:complexType>
  <xs:element name="message" type="tns:message"/>

  <!--* a header has a defined structure
      * (to be specified ...) *-->
  <xs:element name="header">
    <!--* ... your definition of header 
        * validity here ... *-->
  </xs:element>

  <!--* other types and elements used in
      * header ... *-->

  <!--* A payload has NO defined structure. *-->

  <!--* no definition of any type for payload,
      * so it defaults to xs:anyType, and 
      * can contain ... any well-formed XML
      * content. *-->
  <xs:element name="payload"/>

score 1 · Answer 2 · answered Jan 19 '16 at 10:53

You could try something like this:

  <?xml version="1.0" encoding="UTF-8"?>
  <xs:schema 
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://test.any.org"
    xmlns="http://test.any.org"
    elementFormDefault="qualified">
    <xs:element name="objects">
      <xs:complexType nillable="true">
        <xs:sequence>
          <xs:any minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  </xs:schema>

What you're saying with this is any well-formed XML is valid if it comes enclosed in an {http://test.any.org}objects element.

This will handle an empty list:

<objects xmlns="http://test.any.org"/>

A null list:

<objects xmlns="http://test.any.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>

And a list of heterogeneous objects from any namespace (or none):

<any:objects 
  xmlns:any="http://test.any.org"     
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <objectA type="someType" value="someValue"/>
  <objectB xmlns="http://some.external.schema" xsi:nil="true"/>
  <any2:objectC 
    xmlns:any2="http://another.external.schema"    
    xmlns:any3="http://some.funky.attribute">
    <any2:type any3:attr1="hello">Some Type</any2:type>
    <any2:value any3:attr2="whoops">Some Value</any2:value>
  </any2:objectC>
</any:objects>

Of course if you receive a document that does not have an objects element as its root you're going to have to enclose it in one first.

score 0 · Answer 3 · answered Jan 19 '16 at 10:47

0

No, you cannot do this. A validating XML parser needs to match the root element of the input XML document to an element declaration in the schema. If that cannot be done, the vaildation of course fails.

But nothing stops you from validating known content against schemas, and only check "well formedness" against unknown content.

answered Jan 19 '16 at 10:47

forty-two

12,204
2
26
36

The problem with this account is that it assumes schema validation has a binary result; it doesn't. A matching element declaration or type is needed for the result to be *valid*, but documents not matched by declarations in the schema are not necessarily *invalid*. And it's possible to set up validation processes so that they fail on invalid input, and proceed on input which is either valid or has validity=notKnown. – C. M. Sperberg-McQueen Jan 29 '16 at 02:27

score 0 · Answer 4 · edited May 23 '17 at 10:27

0

No^*

An XSD cannot check that XML is well-formed. An XSD can only be used to check that XML is valid. Any XML parser will report whether or not an XML document is well-formed; no XSD is required for that.

For the difference between well-formed and valid, see Well-formed vs Valid XML.

For a very general XSD, see XML Schema that allows anything (xsd:any).

* ...unless you have a means of creating a vacuous XSD and invoking validation in lax mode.

edited May 23 '17 at 10:27

Community

1
1

answered Jan 19 '16 at 13:14

kjhughes

106,133
27
181
240

1

Are you quite sure about that **No**? I agree with you that no XSD is *needed* for well-formedness checking. But does that mean it can't be used, though unnecesary? Well-formedness is a prerequisite for the production of an infoset by a conforming XML parser, so why wouldn't invoking XSD validation with a vacuous schema be a way to perform well-formedness checking? Especially given an existing system which wants an XSD validation step, even for documents of unknown content? – C. M. Sperberg-McQueen Jan 29 '16 at 02:30
Well I thought I was sure until I read the deeper insight reflected in [your answer](http://stackoverflow.com/a/35075825/290085). :-) Thank you for taking the time to explain this. I've qualified my "no" but left my answer intact as I think it still retains a modicum of value as a first-level approximation of the truth. – kjhughes Jan 29 '16 at 04:14
Well edited! The addition of the footnoted qualification addresses the issues very well, I think. I'm glad you left the answer here. – C. M. Sperberg-McQueen Jan 29 '16 at 14:54

General XSD to check if XML is well-formed?

4 Answers4

No*

No^*