I am writing a program to get the "sum" of a word, based on letters (i.e. "abc" = a+b+c = 1+2+3 = 6). I am using the method of total += (int) char - 'a' + 1
(in Java). The program is to be case insensitive ('A' = 'a'), so first I want to convert the char to lowercase if necessary. I have written
if (char < 'a') {char += 32;}
which is correct in UTF-16 and ASCII, but not UTF-8.
My question is, if I were to ship this code, how does encoding work past compiling? If the user is using UTF-8, will the program fail (so it's better to use Character.toLowerCase()
), or since the program is in Java, any characters in the program will be the program's encoding, hence it works?
In case it isn't clear, I have no idea what I'm talking about, so some general info about how the encoding works would be great too.