.Net / CLR Identifiers

Question

I was wondering, what characters are accepted in .Net identifiers?

Not C# or VB.Net, but the CLR.

The reason I ask this is I was looking at how yield return statements were implemented (C# In Depth), and saw that it compiles into code like:

public int <count>5__1;

Are there any other identifier characters that I could use? This code would not be public.

score 2 · Answer 1 · answered Jul 19 '11 at 09:11

2

This is governed by the CLS specification, chapter 8.5.1 "Valid names":

CLS Rule 4: Assemblies shall follow Annex 7 of Technical Report 15 of the Unicode Standard 3.0 governing the set of characters permitted to start and be included in identifiers, available on-line at http://www.unicode.org/unicode/reports/tr15/tr15-18.html. Identifiers shall be in the canonical format defined by Unicode Normalization Form C. For CLS purposes, two identifiers are the same if their lowercase mappings (as specified by the Unicode locale-insensitive, one-to-one lowercase mappings) are the same. That is, for two identifiers to be considered different under the CLS they shall differ in more than simply their case. However, in order to override an inherited definition the CLI requires the precise encoding of the original declaration be used.

Or in other words, it doesn't specify a list of verboten characters, is it only concerned about being able to compare strings without surprises. Which is all the CLR ever has to do. The job of a compiler is much harder, it must be able to recognize tokens in the program, the job of the lexer. Practical lexer implementations set rules on valid characters in an identifier. Not being to start an identifier with a digit for example.

answered Jul 19 '11 at 09:11

Hans Passant

922,412
146
1,693
2,536

Good answer, but note that this is specifying CLS requirements, not CLR requirements. – kvb Jul 19 '11 at 16:44
Any CLR implementation I know follows CLS rules. – Hans Passant Jul 19 '11 at 16:53
I don't believe that's quite correct; the CLS rules indicate a restricted subset of the CLR's behavior that compilers should target if interoperation with other languages is desired (see section 7 of Partition I of the spec). However, the CLR clearly supports non-CLS-compliant types, and the spec specifically mentions that types which are not visible outside of an assembly do not need to follow the CLS rules. – kvb Jul 19 '11 at 17:23
Some of the rules themselves also make this clear. For example, rule 5 states that "All names introduced in a CLS-compliant scope shall be distinct independent of kind, except where the names are identical and resolved via overloading. That is, while the CTS allows a single type to use the same name for a method and a field, the CLS does not." – kvb Jul 19 '11 at 17:27
You can try creating a type which violates this rule (e.g. via Reflection.Emit or ilasm) and the runtime won't have any problem with it. – kvb Jul 19 '11 at 17:28

score 1 · Accepted Answer · answered Jul 19 '11 at 08:14

1

The C# spec says which characters can be used.

The CLR however allows much more. That is why the C# compiler emits them as such.

answered Jul 19 '11 at 08:14

leppie

115,091
17
196
297

I expect what's allowed in the CLR is defined in the CLR specification. – Richard Jul 19 '11 at 08:24
@Richard: The CLR allows almost anything when quoted. Eg: `'!2ss<,'` is valid in IL. – leppie Jul 19 '11 at 08:27
Very interesting! does this behaviour go back to older versions of the CLR as well? – Darkzaelus Jul 19 '11 at 08:35
@Darkzaelus: From what I know, yes all the way back to 1.0. – leppie Jul 19 '11 at 08:53

.Net / CLR Identifiers

2 Answers2

Linked