123

Short question

Is XML case-sensitive?

Longer question

For example:

<Shirt color="Red"/>

The attribute color is of type string that may contain a set of valid colors (Red, Blue and Green).

To validate the XML, I used the following XSD:

  <xs:simpleType name="ColorType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Red"/>
      <xs:enumeration value="Blue"/>
      <xs:enumeration value="Green"/>
    </xs:restriction>
  </xs:simpleType>

Am I expected to accept different case variations of Red, Blue and Green? Or XML is widely accepted as case-sensitive?

Community
  • 1
  • 1
Ian
  • 5,625
  • 11
  • 57
  • 93

3 Answers3

95

Short Answer:

Yes - XML is case sensitive.

Longer Answer:

It is widely accepted as case sensitive, however if you want to accept more flexibly, take a look at the question below, which discusses having case-insensitive enumerations:

XML Schema Case Insensitive Enumeration of Simple Type String

Joe DF
  • 5,438
  • 6
  • 41
  • 63
Jon Egerton
  • 40,401
  • 11
  • 97
  • 129
  • 6
    Longer answer: there's nothing to stop you writing an XML application which is case insenstive. But it wouldn't be expected or usual. – Matthew Wilson Sep 14 '11 at 10:30
19

With XSD 1.1 you can achieve a case-insensitive enumeration using an assertion:

<xs:simpleType name="RGB">
  <xs:restriction base="xs:string">
    <xs:assert test="lower-case($value) = ('red', 'green', 'blue')"/>
  </xs:restriction>
</xs:simpleType>

XSD 1.1 is supported in recent releases of Saxon and Xerces.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Just be aware of using XSD 1.1, at the current time it is just a W3C recommendation - Xerces with XSD 1.1 validation is a standalone artifact in beta state, and XSD 1.1 is not supported by the JDK, not even by the most recent one 1.8. It isn't even planned for JDK 1.9 as far as I know. You cannot use advanced XML technologies like JAXB based on XSD 1.1 built-in from the JDK this way. – René Dec 29 '15 at 17:19
  • 2
    Yes, you need to be cautious, but @René 's answer needs qualification. Firstly, "just a W3C recommendation": well, so is XSD 1.0. "Recommendation" is what W3C calls a finished, final, ratified spec. Yes, it's true there are only three implementations of XSD 1.1 currently (Saxon, Xerces, and Altova), and this is a factor you should take into account. But don't be held back by what's in the JDK - the JDK has long abandoned support for the latest W3C standards (e.g it doesn't even support XPath 2.0) but there are plenty of third-party libraries to fill the gap. – Michael Kay Dec 29 '15 at 22:26
  • Of course it depends on the technology used. If you implement low-level parsing and code you can use a 3rd-party parser library (Xerces for XSD 1.1 is still beta, there are two different artifacts of the same Xerces version!). For the example of JAXB - @Michael: Do you know a 3rd-party JAXB implementation or derivate making usage of XSD 1.1, thus, generating classes for example using "alternatives"? Anyway, it's on Ian to choose depending on his needs. – René Dec 31 '15 at 09:47
1

Yes. Examining the current XML specification, the only apparent statement of this is in Section 1.2, where the authors state:

Two strings or names being compared are identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. No case folding is performed.

Note the reference to a lack of case folding. As an example of the trouble that inattention to this detail can cause, I recently analysed some code that extracts data from a moderate-sized (a few 100 GB) corpus of physiological data where one parameter name had been changed from SPO2 to SpO2 over the course of years, within the primary software.

In contrast, the software used to extract the data faithfully preserved the XML convention and saw the parameters as distinct. There was worse to come. Because the parameter name was used to name the CSV file to which data were written, and this was in Windows, which case-folds its filenames, two handles were being opened on the same file, resulting in mysterious errors.