I have a working xsd that refuses to validate XML instances containing invalid whitespace (see below for more details, but that includes the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, no beginning or ending space (#x20) character, or a sequence of two or more adjacent space characters).
Sample XSD:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
targetNamespace="http://www.example.com"
xmlns:test="http://www.example.com">
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/>
<xs:element name="test-token" type="test:Tokenized500Type"></xs:element>
<xs:simpleType name="Tokenized500Type">
<xs:annotation>
<xs:documentation>An element of this type has minimum length of one character, a max of 500, and may not
contain any of: the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, shall
not begin or end with a space (#x20) character, or a sequence of two or more adjacent space
characters.</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:maxLength value="500"/>
<xs:minLength value="1"/>
<xs:pattern value="\S+( \S+)*"/>
</xs:restriction>
</xs:simpleType>
I tested this with literal whitespace characters as above.
What if the XML instance includes escaped whitespace in the relevant element content? Will this cause a validation error or not?
Here's an example instance with the escaped version:
<?xml version="1.0" encoding="UTF-8"?>
<test-token xmlns="http://www.example.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.com"> </test-token>
See also:
Meaning of xs:token for XSD processor: Will an instance with a xsd:token-type element containing whitespace pass validation?
XSD restriction to allow only xs:token whitespace: What is the regular expression for the set of strings that validate exactly the same for xsd:token and xsd:string?