58

One thing I find quite confusing is knowing which characters and combinations I can use in method and variable names. For instance

val #^ = 1 // legal
val #  = 1 // illegal
val +  = 1 // legal
val &+ = 1 // legal
val &2 = 1 // illegal
val £2 = 1 // legal
val ¬  = 1 // legal

As I understand it, there is a distinction between alphanumeric identifiers and operator identifiers. You can mix an match one or the other but not both, unless separated by an underscore (a mixed identifier).

From Programming in Scala section 6.10,

An operator identifier consists of one or more operator characters. Operator characters are printable ASCII characters such as +, :, ?, ~ or #.

More precisely, an operator character belongs to the Unicode set of mathematical symbols(Sm) or other symbols(So), or to the 7-bit ASCII characters that are not letters, digits, parentheses, square brackets, curly braces, single or double quote, or an underscore, period, semi-colon, comma, or back tick character.

So we are excluded from using ()[]{}'"_.;, and `

I looked up Unicode mathematical symbols on Wikipedia, but the ones I found didn't include +, :, ? etc. Is there a definitive list somewhere of what the operator characters are?

Also, any ideas why Unicode mathematical operators (rather than symbols) do not count as operators?

Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
  • 3
    I particularly miss ². Scala kind of promises one can make code that uses clever variable (and method) names. But you cannot give a value to a variable x². Illegal character. – akauppi Dec 03 '12 at 18:28

3 Answers3

73

Working from the EBNF syntax in the specification:

upper ::= ‘A’ | ... | ‘Z’ | ‘$’ | ‘_’ and Unicode category Lu
lower ::= ‘a’ | ... | ‘z’ and Unicode category Ll
letter ::= upper | lower and Unicode categories Lo, Lt, Nl
digit ::= ‘0’ | ... | ‘9’
opchar ::= “all other characters in \u0020-007F and Unicode
            categories Sm, So except parentheses ([]) and periods”

But also taking into account the very beginning on Lexical Syntax that defines:

Parentheses ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’.
Delimiter characters ‘‘’ | ‘’’ | ‘"’ | ‘.’ | ‘;’ | ‘,’

Here is what I come up with. Working by elimination in the range \u0020-007F, eliminating letters, digits, parentheses and delimiters, we have for opchar... (drum roll):

! # % & * + - / : < = > ? @ \ ^ | ~ and also Sm and So - except for parentheses and periods.

In summary, here are some valid examples that highlights all cases—watch out for \ in the REPL; I had to escape as \\:

val !#%&*+-/:<=>?@\^|~ = 1 // All simple opchars
val simpleName = 1
val withDigitsAndUnderscores_ab_12_ab12 = 1
val wordEndingInOpChars_!#%&*+-/:<=>?@\^|~ = 1
val !^©® = 1 // opchars and symbols
val abcαβγ_!^©® = 1 // Mixing Unicode letters and symbols

Note 1:

I found this Unicode category index to figure out Lu, Ll, Lo, Lt, Nl:

  • Lu (uppercase letters)
  • Ll (lowercase letters)
  • Lo (other letters)
  • Lt (titlecase)
  • Nl (letter numbers like roman numerals)
  • Sm (symbol math)
  • So (symbol other)

Note 2:

val #^ = 1 // legal   - two opchars
val #  = 1 // illegal - reserved word like class or => or @
val +  = 1 // legal   - opchar
val &+ = 1 // legal   - two opchars
val &2 = 1 // illegal - opchar and letter do not mix arbitrarily
val £2 = 1 // working - £ is part of Sc (Symbol currency) - undefined by spec
val ¬  = 1 // legal   - part of Sm

Note 3:

Other operator-looking things that are reserved words: _ : = => <- <: <% >: # @ and also \u21D2 ⇒ and \u2190

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
huynhjl
  • 41,520
  • 14
  • 105
  • 158
  • 2
    Thanks. Also, as the spec says, we are limited to Unicode Basic Multilingual Plane characters, i.e. 2-byte characters up \ufffd. Thus from `So`, \u262f the Yin Yang operator is legal, but \u1f360 the Roasted Sweet Potato operator is not supported (it's interpreted as \u1f36 + '0'). – Luigi Plinge Oct 05 '11 at 14:51
  • 1
    In Scala 2.9, `£` is now reported as an `illegal character` (presumably the correct behavior wrt the spec). – Mechanical snail Aug 05 '12 at 06:48
  • The § character is also not valid. Any ideas why? – Themerius Apr 14 '16 at 13:16
10

The language specification. gives the rule in Chapter 1, lexical syntax (on page 3):

  1. Operator characters. These consist of all printable ASCII characters \u0020-\u007F. which are in none of the sets above, mathematical sym- bols(Sm) and other symbols(So).

This is basically the same as your extract of Programming in Programming in Scala. + is not an Unicode mathematical symbol, but it is definitely an ASCII printable character not listed above (not a letter, including _ or $, a digit, a paranthesis, a delimiter).

In your list:

  1. # is illegal not because the character is not an operator character (#^ is legal), but because it is a reserved word (on page 4), for type projection.
  2. &2 is illegal because you mix an operator character & and a non-operator character, digit 2
  3. £2 is legal because £ is not an operator character: it is not a seven bit ASCII, but 8 bit extended ASCII. It is not nice, as $ is not one either (it is considered a letter).
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Didier Dupont
  • 29,398
  • 7
  • 71
  • 90
  • The "mathematical sym- bols(Sm) and other symbols(So)" is the object of "consist of" or "in none of"? – Jing He Sep 05 '17 at 13:13
0

Use backticks to escape limitations and use Unicode symbols:

val `r→f` = 150
println(`r→f`)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Hartmut Pfarr
  • 5,534
  • 5
  • 36
  • 42