Why can identifiers contain '$' in C?

Question

Recently I saw code like this:

int $ = 123;

So why can '$' be in an identifier in C?

Is it the same in C++?

It is an extension to the C language that many compilers implement by default. If you don't want it, you need to disable it explicitly. For gcc and clang, it would be an aptly named `-fno-dollars-in-identifiers` command line option. — n. m. could be an AI, Jul 11 '21 at 07:45
[$ symbol in C variable names](https://stackoverflow.com/q/28426733/995714), [What are the '@' and '$' for in C/C++?](https://stackoverflow.com/q/18494262/995714) — phuclv, Jul 11 '21 at 09:38
Tangential note: The VMS operating system was very fond of having dollar signs all over the place — calling the boot device SYS$SYSDEVICE and that sort of thing. I suspect that e.g. gcc acquired the habit of allowing $ in order to fit into that environment. — Ture Pålsson, Jul 11 '21 at 10:29
@Jens: If a C implementation allows `$` in identifiers, that is conforming to the C standard, and, when a program uses `$` in identifiers in an C implementation that allows it, that is also conforming to the C standard. — Eric Postpischil, Jul 11 '21 at 13:42

Isacc Barker · Accepted Answer · 2021-07-11T16:07:10.317

This is not good practice. Generally, you should only use alphanumeric characters and underscores in identifiers ([a-z][A-Z][0-9]_).

Surface Level

Unlike in other languages (bash, perl), C does not use $ to denote the usage of a variable. As such, it is technically valid. As of C++ 17, this is standards conformant, see Draft n4659. In C it most likely falls under C11, 6.4.2. This means that it does seem to be supported by modern compilers.

As for your C++ question, lets test it!

int main(void) {
    int $ = 0;
    return $;
}

On GCC/G++/Clang/Clang++, this indeed compiles, and runs just fine.

Deeper Level

Compilers take source code, lex it into a token stream, put that into an abstract syntax tree (AST), and then use that to generate code (e.g. assembly/LLVM IR). Your question really only revolves around the first part (e.g. lexing).

The grammar (thus the lexer implementation) of C/C++ does not treat $ as special, unlike commas, periods, skinny arrows, etc... As such, you may get an output from the lexer like this from the below c code:

int i_love_$ = 0;

After the lexer, this becomes a token steam like such:

["int", "i_love_$", "=", "0"]

If you where to take this code:

int i_love_$,_and_.s = 0;

The lexer would output a token steam like:

["int", "i_love_$", ",", "_and_", ".", "s", "=", "0"]

As you can see, because C/C++ doesn't treat characters like $ as special, it is processed differently than other characters like periods.

See C11, 6.4.2. It probably falls in "other implementation-defined characters" — Paul Ogilvie, Jul 11 '21 at 07:30

Eric Postpischil · Answer 2 · 2021-07-11T09:45:57.727

The 2018 C standard says in 6.4.2 1 that an identifier consists of a nondigit character followed zero or more nondigit or digit characters, where the nondigit characters are:

one of the characters _, a through z, or A through Z,
a universal-character-name, which is \u followed by four hexadecimal digits or \U followed by eight hexadecimal digits, that is outside certain ranges¹, or
implementation-defined characters.

The digit characters are 0 through 9.

Taking GCC as an example, its documentation says these additional characters are defined in its preprocessor section, and that section says GCC accepts $ and the characters that correspond to the universal character names.² Thus, allowing $ is a choice made by the compiler implementors.

Draft n4659 of the 2017 C++ standard has the same rules, in clause 5.10 [lex.name], except it limits the universal character names further.

Footnote

¹ These \u and \U forms allow you to write any character as a hexadecimal code. The excluded ranges are those in C’s basic character set and codes reserved for control characters and special uses.

² The “universal character names” are the \u and \U forms. The characters that correspond to them are the characters that those forms represent. For example, π is a universal character, and \u03c0 is the universal character name for it.

Why can identifiers contain '$' in C?

2 Answers2

Surface Level

Deeper Level

Footnote