0

Is someone able to explain why _variable$2 is a valid C programming identifier?

Based on the ISO documentation as well as a number of online supporting (credible) sources, I thought that only letters, digits and the _ (underscore) characters were allowed.

However variable names such as _variable$2 are completely valid and run as normal when compiled and tested. And if this is the case, what other special characters can and cannot be used similarly? Is this limited to simply characters or could even emoji's be substituted into valid identifier names within the C programming language? Any help would be appreciated :)

phuclv
  • 37,963
  • 15
  • 156
  • 475
Vilitaria
  • 107
  • 9
  • Welcome to Stack Overflow! Use formatting tools to make your post more readable. Code block should look like `code block`. Use **bold** *italics* if needed. – Morse Jun 11 '18 at 01:13
  • duplicate of [$ in variable name?](https://stackoverflow.com/q/7926394/995714) – phuclv Jun 11 '18 at 03:54
  • C99 supports Unicode characters in identifiers [What constitutes a “valid” C Identifier?](https://stackoverflow.com/q/34319000/995714), [Unicode Identifiers and Source Code in C++11?](https://stackoverflow.com/q/5676978/995714), [ (and other unicode characters) in identifiers not allowed by g++](https://stackoverflow.com/q/12692067/995714) – phuclv Jun 12 '18 at 14:32

2 Answers2

2

Is someone able to explain why: '_variable$2' is a valid C programming identifier?

It isn't, in the sense that a strictly-conforming C program cannot use identifiers that contain the '$' character.

I thought that only letters, digits and the '_' (underscore) characters were allowed.

Only the underscore, the decimal digits, the unaccented upper- and lowercase letters, and universal character names are required to be allowed in C identifiers (the universal character names are new in C11). However, the standard explicitly permits implementations to define other characters that they accept as well.

However variable names such as '_variable$2' are completely valid and run as normal when compiled and tested.

That one implementation accepts such identifiers does not make them "completely valid". It just makes them valid in that implementation.

And if this is the case, what other special characters can and cannot be used similarly? Is this limited to simply characters or could even emoji's be substituted into valid identifier names within the C programming language?

The standard specifies that the list of additional characters accepted in identifiers is implementation defined. This has a specific meaning in the standard: conforming implementations must document their choices for all implementation defined characteristics. Therefore, if you're willing to rely on the specific characteristics of some chosen implementation, then you should find a list or description of that implementation's allowed extra characters in its documentation.

On the other hand, if you want your program to work unchanged with multiple different C implementations, then you should stick to only letters, digits, and the underscore, and maybe universal character names in identifiers.

And don't be too quick to overlook those universal character names: to the extent that emoji (and many other characters) are encoded by Unicode, you can use UCNs to include them in your identifiers, at least in a logical sense, provided that you are content to rely on C11.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
0

The current C-standard ISO/IEC 9899:2011 defines the set of characters that are allowed to use in identifiers in section 6.4.2. There they added one of these quite uncomfortable sentences:

other implementation-defined characters

So, in theory, (nearly) everything goes as long as the identifier starts with one character out of the group named identifier-nondigit which contains the letters a-z, both upper- and lowercase, and the underbar.

deamentiaemundi
  • 5,502
  • 2
  • 12
  • 20