So, after reading the Haskell specs (which can be assumed has influenced Elm), the JavaScript specs, and trial and error, I have arrived at the following rules:
- An identifier must begin with a character from the unicode categories:
- Uppercase letter (Lu) (modules, types)
- Lowercase letter (Ll) (functions, variables)
- Titlecase letter (Lt) (modules, types)
- The rest of the characters must belong to any of the following categories:
- Uppercase letter (Lu)
- Lowercase letter (Ll)
- Titlecase letter (Lt)
- Modifier letter (Lm)
- Other letter (Lo)
- Decimal digit number (Nd)
- Letter number (Nl)
- Or be
_
(except for in module names).
Technically "Other number" (No) seems to also be valid in Elm, but it crashes after it's been compiled to JavaScript.
I used this tool to get the ranges for each category.