What is an xs:NCName type and when should it be used?

Question

I ran one of my xml files through a schema generator and everything generated was what was expected, with the exception of one node:

<xs:element name="office" type="xs:NCName"/>

What exactly is xs:NCName? And why would one use it, rather xs:string?

score 127 · Answer 1 · answered May 28 '11 at 01:00

@skyl practically provoked me to write this answer so please mind the redundancy.

NCName stands for "non-colonized name". NCName can be defined as an XML Schema regular expression [\i-[:]][\c-[:]]*

...and what does that regex mean?

\i and \c are multi-character escapes defined in XML Schema definition.
http://www.w3.org/TR/xmlschema-2/#dt-ccesN
\i is the escape for the set of initial XML name characters and \c is the set of XML name characters. [\i-[:]] means a set that consist of the set \i excluding a set that consist of the colon character :. So in plain English it would mean "any initial character, but not :". The whole regular expression reads as "One initial XML name character, but not a colon, followed by zero or more XML name characters, but not a colon."

Practical restrictions of an NCName

The practical restrictions of NCName are that it cannot contain several symbol characters like :, @, $, %, &, /, +, ,, ;, whitespace characters or different parenthesis. Furthermore an NCName cannot begin with a number, dot or minus character although they can appear later in an NCName.

Where are NCNames needed

In namespace conformant XML documents all names must be either qualified names or NCNames. The following values must be NCNames (not qualified names):

namespace prefixes
values representing an ID
values representing an IDREF
values representing a NOTATION
processing instruction targets
entity names

The line 'Furthermore an NCName cannot begin with a number' helped me understand that a number can't be an 'xs:ID' — Sean Murphy, Mar 23 '16 at 23:04
How can I convert that expression to a programming language like Java or JS? — calbertts, May 04 '16 at 20:14
@calbertts, See https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html — Kirby, Aug 02 '16 at 19:23
You can check wether it is a regular CName with the regex: "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_][\\w\\.\\-\\d]*". That means. the value should start with a letter or underscore and then contains of words, dots, dashes, underscores, digits. You can try it at: https://regexr.com/ — Naxos84, Sep 20 '17 at 06:31
My regex given above only handles latin letters. If you want the full check on NCNames according to the specification https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName you should use this class: http://www.java2s.com/Code/Java/XML/CheckswhetherthesuppliedStringisanNCNameNamespaceClassifiedName.htm — Naxos84, Sep 20 '17 at 07:55

score 99 · Accepted Answer · answered Oct 27 '09 at 15:11

99

NCName is non-colonized name e.g. "name". Compared to QName which is qualified name e.g. "ns:name". If your names are not supposed to be qualified by different namespaces, then they are NCNames.

xs:string puts no restrictions on your names at all, but xs:NCName basically disallows ":" to appear in the string.

answered Oct 27 '09 at 15:11

Andrey Adamovich

20,285
14
94
132

2

empty string is also disallowed in `xs:NCName` – WeizhongTu Mar 05 '18 at 07:05

izilotti · Answer 3 · 2022-10-24T15:11:43.557

29

Practically speaking...

Allowed characters: -, ., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, _, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z ; plus all non-ASCII characters matching \p{L}+.

Also, digits, - and . cannot be used as the first character of the value.

Disallowed characters: , !, ", #, $, %, &, ', (, ), *, +, ,, /, :, ;, <, =, >, ?, @, [, \, ], ^, `` , {, |, }, ~`

edited Oct 24 '22 at 15:11

answered Jun 05 '13 at 20:17

izilotti

4,757
1
48
55

4

I think this is missing lots of allowed characters like, for example, é or ø. – Eric Bloch Jul 19 '13 at 16:05
To cover those non-ascii cases, it should include \p{L}+ as part of the character set – Kenston Choi Jul 22 '13 at 03:26
16

Digits cannot be used as the first character, either. – Thilo Feb 05 '15 at 05:36

score 8 · Answer 4 · answered May 26 '11 at 16:54

8

http://books.xmlschemata.org/relaxng/ch19-77215.html

No spaces or colons. Allows "_" and "-".

You would use this instead of string so that you can validate that the value is limited to what is allowed. It maps well to certain conventions for name/identifier like django's concept of "slug", for instance.

I upvote the person who [\i-[:]][\c-[:]]* translates into English for us.

answered May 26 '11 at 16:54

Skylar Saveland

11,116
9
75
91

13

I added an answer that translates `[\i-[:]][\c-[:]]*` into English. Go ahead and upvote, as you promised ;) – jasso May 28 '11 at 01:03

What is an xs:NCName type and when should it be used?

4 Answers4

...and what does that regex mean?

Practical restrictions of an NCName

Where are NCNames needed

Linked