106

I ran one of my xml files through a schema generator and everything generated was what was expected, with the exception of one node:

<xs:element name="office" type="xs:NCName"/>

What exactly is xs:NCName? And why would one use it, rather xs:string?

jasso
  • 13,736
  • 2
  • 36
  • 50

4 Answers4

127

@skyl practically provoked me to write this answer so please mind the redundancy.

NCName stands for "non-colonized name". NCName can be defined as an XML Schema regular expression [\i-[:]][\c-[:]]*

...and what does that regex mean?

\i and \c are multi-character escapes defined in XML Schema definition.
http://www.w3.org/TR/xmlschema-2/#dt-ccesN
\i is the escape for the set of initial XML name characters and \c is the set of XML name characters. [\i-[:]] means a set that consist of the set \i excluding a set that consist of the colon character :. So in plain English it would mean "any initial character, but not :". The whole regular expression reads as "One initial XML name character, but not a colon, followed by zero or more XML name characters, but not a colon."

Practical restrictions of an NCName

The practical restrictions of NCName are that it cannot contain several symbol characters like :, @, $, %, &, /, +, ,, ;, whitespace characters or different parenthesis. Furthermore an NCName cannot begin with a number, dot or minus character although they can appear later in an NCName.

Where are NCNames needed

In namespace conformant XML documents all names must be either qualified names or NCNames. The following values must be NCNames (not qualified names):

  • namespace prefixes
  • values representing an ID
  • values representing an IDREF
  • values representing a NOTATION
  • processing instruction targets
  • entity names
jasso
  • 13,736
  • 2
  • 36
  • 50
  • 5
    The line 'Furthermore an NCName cannot begin with a number' helped me understand that a number can't be an 'xs:ID' – Sean Murphy Mar 23 '16 at 23:04
  • How can I convert that expression to a programming language like Java or JS? – calbertts May 04 '16 at 20:14
  • @calbertts, See https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html – Kirby Aug 02 '16 at 19:23
  • You can check wether it is a regular CName with the regex: "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_][\\w\\.\\-\\d]*". That means. the value should start with a letter or underscore and then contains of words, dots, dashes, underscores, digits. You can try it at: https://regexr.com/ – Naxos84 Sep 20 '17 at 06:31
  • My regex given above only handles latin letters. If you want the full check on NCNames according to the specification https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName you should use this class: http://www.java2s.com/Code/Java/XML/CheckswhetherthesuppliedStringisanNCNameNamespaceClassifiedName.htm – Naxos84 Sep 20 '17 at 07:55
99

NCName is non-colonized name e.g. "name". Compared to QName which is qualified name e.g. "ns:name". If your names are not supposed to be qualified by different namespaces, then they are NCNames.

xs:string puts no restrictions on your names at all, but xs:NCName basically disallows ":" to appear in the string.

Andrey Adamovich
  • 20,285
  • 14
  • 94
  • 132
29

Practically speaking...

Allowed characters: -, ., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, _, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z ; plus all non-ASCII characters matching \p{L}+.

Also, digits, - and . cannot be used as the first character of the value.

Disallowed characters: , !, ", #, $, %, &, ', (, ), *, +, ,, /, :, ;, <, =, >, ?, @, [, \, ], ^, `` , {, |, }, ~`

izilotti
  • 4,757
  • 1
  • 48
  • 55
8

http://books.xmlschemata.org/relaxng/ch19-77215.html

No spaces or colons. Allows "_" and "-".

You would use this instead of string so that you can validate that the value is limited to what is allowed. It maps well to certain conventions for name/identifier like django's concept of "slug", for instance.

I upvote the person who [\i-[:]][\c-[:]]* translates into English for us.

Skylar Saveland
  • 11,116
  • 9
  • 75
  • 91
  • 13
    I added an answer that translates `[\i-[:]][\c-[:]]*` into English. Go ahead and upvote, as you promised ;) – jasso May 28 '11 at 01:03