5

Why are special characters (except underscore) not allowed in variable names of programming languages?

Is there a reason related to computer architecture or organisation?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user3287367
  • 161
  • 1
  • 3
  • 9
  • It's related to computer _history_. Namely, if you created a variable with `@` in the name, then people who don't have a `@` key can't edit your code. See also: http://en.wikipedia.org/wiki/Digraphs_and_trigraphs – Mooing Duck Jun 16 '14 at 18:56
  • 2
    This varies with the language. See http://stackoverflow.com/questions/7656937/valid-identifier-characters-in-scala – Don Roby Jun 16 '14 at 18:58
  • 1
    As stated above this varies per language off course (e.g. Swift allows unicode identifiers), but another reason (certainly historically) could be to make the symbol tables shorter (and their manipulations simpler) when allowing only plain ASCII. – ChristopheD Jun 16 '14 at 19:04
  • @MooingDuck They can't send emails either. – tejasvi88 Dec 19 '21 at 10:43

2 Answers2

7

Most languages have long histories, using ASCII (or EBCDIC) character sets. Those languages tend to have simple identifier descriptions (e.g., starts with A-Z, followed by A-Z,0-9, maybe underscore; COBOL allows "-" as part of a name). When all you had was an 029 keypunch or a teletype, you didn't have many other characters, and most of them got used as operator syntax or punctuation.

On older machines, this did have the advantage that you could encode an identifier as a radix 37 (A-Z,0-9, null) [6 characters in 32 bits] or radix 64 (A-Z,a-z,0-9,underscore and null) numbers [6 characters in 36 bits, a common word size in earlier generations of machines) for small symbol tables. A consequence: many older languages had 6 character limits on identifier sizes (e.g., FORTRAN).

LISP languages have long been much more permissive; names can be anything but characters with special meaning to LISP, e.g., ( ) [ ] ' ` #, and usually there are ways to insert these characters into names using some kind of escape convention. Our PARLANSE language is like LISP; it uses "~" as an escape, so you can write ~(begin+~)end as a single identifier whose actual spelling is "(begin+end)".

More modern languages (Java, C#, Scala, ...., uh, even PARLANSE) grew up in an era of Unicode, and tend to allow most of unicode in identifiers (actually, they tend to allow named Unicode subsets as parts of identifiers). An identifier made of chinese characters is perfectly legal in such languages.

Its kind of a matter of taste in the Western hemisphere: most identifier names still tend to use just letters and digits (sometimes, Western European letters). I don't know what the Japanese and Chinese really use for identifier names when they have Unicode capable character sets; what little Asian code I have seen tends to follow western identifier conventions but the comments tend to use much more of the local native and/or Unicode character set.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
1

Fundamentally it is because they're mostly used as operators or separators, so it would introduce ambiguity.

Is there any reason relate to computer architecture or organization.

No. The computer can't see the variable names. Only the compiler can. But it has to be able to distinguish a variable name from two variable names separated by an operator, and most language designers have adopted the principle that the meaning of a computer program should not be affected by white space.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • I love the snowman operator! (And I wish that more languages considered whitespace a more relevant separator :-/) – user2864740 May 11 '16 at 19:50
  • @user2864740 More relevant than what? It's already relevant. What do you mean by 'more'? And what is the snowman operator? – user207421 Apr 09 '20 at 09:44
  • The [snowman operator](https://www.compart.com/en/unicode/U+2603). – user2864740 Apr 09 '20 at 14:03
  • @user2864740 That's a Unicode character. In what language is it an operator? – user207421 Apr 18 '21 at 10:09
  • It was an example. Haskell allows defining operators with such names, as an example of a language in which such an operator can be valid: https://wiki.haskell.org/Unicode-symbols Limiting to some subset of ASCII is merely a language design choice / restriction. – user2864740 Apr 18 '21 at 20:13
  • Scala (as does Java!) also allows Unicode in identifiers, although only some characters can. start operators. This is related to Scala’s lack of significant white space. https://stackoverflow.com/questions/7656937/valid-identifier-characters-in-scala , https://stackoverflow.com/questions/44782005/scala-custom-operator-example-abs – user2864740 Apr 18 '21 at 20:19
  • (Oops, it’s not the significant white-space issue with the leading operator ‘sigil’ in Scala, although thar is relevant in some other constructions: still, it comes down to how the language grammar was defined.) – user2864740 Apr 18 '21 at 20:38