Recently I saw code like this:
int $ = 123;
So why can '$' be in an identifier in C?
Is it the same in C++?
Recently I saw code like this:
int $ = 123;
So why can '$' be in an identifier in C?
Is it the same in C++?
This is not good practice. Generally, you should only use alphanumeric characters and underscores in identifiers ([a-z][A-Z][0-9]_
).
Unlike in other languages (bash, perl), C does not use $
to denote the usage of a variable. As such, it is technically valid. As of C++ 17, this is standards conformant, see Draft n4659. In C it most likely falls under C11, 6.4.2. This means that it does seem to be supported by modern compilers.
As for your C++ question, lets test it!
int main(void) {
int $ = 0;
return $;
}
On GCC/G++/Clang/Clang++, this indeed compiles, and runs just fine.
Compilers take source code, lex it into a token stream, put that into an abstract syntax tree (AST), and then use that to generate code (e.g. assembly/LLVM IR). Your question really only revolves around the first part (e.g. lexing).
The grammar (thus the lexer implementation) of C/C++ does not treat $
as special, unlike commas, periods, skinny arrows, etc... As such, you may get an output from the lexer like this from the below c code:
int i_love_$ = 0;
After the lexer, this becomes a token steam like such:
["int", "i_love_$", "=", "0"]
If you where to take this code:
int i_love_$,_and_.s = 0;
The lexer would output a token steam like:
["int", "i_love_$", ",", "_and_", ".", "s", "=", "0"]
As you can see, because C/C++ doesn't treat characters like $ as special, it is processed differently than other characters like periods.
The 2018 C standard says in 6.4.2 1 that an identifier consists of a nondigit character followed zero or more nondigit or digit characters, where the nondigit characters are:
_
, a
through z
, or A
through Z
,\u
followed by four hexadecimal digits or \U
followed by eight hexadecimal digits, that is outside certain ranges1, orThe digit characters are 0
through 9
.
Taking GCC as an example, its documentation says these additional characters are defined in its preprocessor section, and that section says GCC accepts $
and the characters that correspond to the universal character names.2 Thus, allowing $
is a choice made by the compiler implementors.
Draft n4659 of the 2017 C++ standard has the same rules, in clause 5.10 [lex.name], except it limits the universal character names further.
1 These \u
and \U
forms allow you to write any character as a hexadecimal code. The excluded ranges are those in C’s basic character set and codes reserved for control characters and special uses.
2 The “universal character names” are the \u
and \U
forms. The characters that correspond to them are the characters that those forms represent. For example, π
is a universal character, and \u03c0
is the universal character name for it.