What's wrong with this Haskell unicode variable name?

Question

What's wrong this this code?

Prelude> let xᵀ = "abc"
<interactive>:10:6: lexical error at character '\7488'

According to my reading of the Haskell 2010 report, any uppercase or lowercase Unicode letter should be valid at the end of a variable name. Does the ᵀ character (MODIFIER LETTER CAPITAL T) not qualify as an uppercase Unicode letter?

Is there a better character to represent the transpose of a vector? I'd like to stay concise since I'm evaluating a dense mathematical formula.

I'm running GHC 7.8.3.

What's wrong with the ASCII `xT`? It takes up the same number of columns and doesn't require unicode characters, which makes it printable in environments that have subpar unicode support (i.e. Windows) — bheklilr, Jul 24 '14 at 19:27
@bheklilr: "why would anybody use Windows" aside, I think even that should nowadays have no problem with UTF-8 encoded files. I wouldn't happily use `xT` because it looks more like the `x` is a prefix to the `T` rather than the other way around; but `x'` does look rather fine to me. — leftaroundabout, Jul 24 '14 at 19:31
I'm trying to represent the outer product xᵀx. I could use xTx, but I don't find it quite expressive enough. — tba, Jul 24 '14 at 19:40
MODIFIER LETTER CAPITAL T is in the Phonetic Extensions block and was coded for the purposes of phonetic notations. This explains some of its properties. — Jukka K. Korpela, Jul 24 '14 at 19:58
@leftaroundabout "why would anybody use Windows" -> I do because it's what I'm forced into for employment. While we don't use Haskell for any projects at work, I do use it for data processing tasks and for experimentation. I do really wish that Haskell had better Windows support, because it would make my life a lot easier. Besides, Haskell isn't going to gain much momentum without good support for what is still the most popular desktop and enterprise OS. — bheklilr, Jul 24 '14 at 20:07
@tba I don't know about that, I've written code quite like this before in Python with the variables `x`, `xT`, `xTx` and `xTxi` where the `i` indicated inverse. I haven't found that the code is particularly unclear to read, so long as you keep the context in mind. If you really wanted, separate with underscores or tick marks, maybe `x` ,`xT`, `xT'x`, `xT'x'i`, but it's up to you. — bheklilr, Jul 24 '14 at 20:10
@bheklilr Note that in python `xᵀ` is a valid identifier (python doesn't make any distinction between lowercase and uppercase, hence *any* unicode that belongs to the category `Letter` can be used in identifiers). — Bakuriu, Jul 24 '14 at 20:21
@Bakuriu That's interesting to know, but I've discovered that some of the tools we use (and some people's editors) are not so unicode friendly and will mangle any special characters, which makes things really fun when I've got to have spanish and mandarin translations for my applications. Personally, I'm just going to avoid using unicode characters until it's supported better across platforms and tools. That's not to say I don't _want_ to use them, I happen to think Haskell looks really nice when sprinkled with actual symbols rather than ASCII art, it just isn't feasible for me yet =( — bheklilr, Jul 24 '14 at 20:25

score 8 · Accepted Answer · answered Jul 24 '14 at 19:19

Uppercase Unicode letters are in the Unicode character category Letter, Uppercase [Lu].

Lowercase Unicode letters are in the Unicode character category Letter, Lowercase [Ll].

MODIFIER LETTER CAPITAL T is in the Unicode character category Letter, Modifier [Lm].

I tend to stick to ASCII, so I'd probably just use a name like xTrans or x', depending on the number of lines it is in scope.

score 8 · Answer 2 · answered Jul 24 '14 at 19:28

Characters not in the category ANY are not valid in Haskell programs and should result in a lexing error.

where

ANY         →   graphic | whitechar 

graphic     →   small | large | symbol | digit | special | " | '

small       →   ascSmall | uniSmall | _<br>
ascSmall    →   a | b | … | z<br>
uniSmall    →   any Unicode lowercase letter

...

uniDigit    →   any Unicode decimal digit 

...

Modifier letters like ᵀ are not legal Haskell at all. (Unlike sub- or superscript numbers – which are in the Number, Other category so a₁ is treated much like a1.)

I like to use non-ASCII Unicode when it helps readability, but unless you've already assigned another meaning to the prime symbol using it here for transpose should be just fine.

What's wrong with this Haskell unicode variable name?

2 Answers2

Linked