My idea was to code a Hangman-like game in C. I want it to be able to use German words with umlauts (eg: ä
, ü
, ö
) and also Greek words (completely non-ASCII characters).
My compiler and my terminal can handle Unicode well. Displaying the strings works well.
But how should I do operations on these strings? For the German language I could maybe handle the 6 upper- and lowercase accented characters by taking care of these cases in the functions. But considering Greek it seems like impossible.
I wrote this test code. It outputs the string, the length of the string (of course wrong, because the UTF-8 sequences take the place of two characters), and the value of the individual characters of the string in plain text and hex.
#include <stdio.h>
#include <string.h>
int main() {
printf("123456789\n");
char aTestString[] = "cheese";
printf("%s ist %d Zeichen lang\n", aTestString, strlen(aTestString));
for (int i = 0; i < strlen(aTestString); i++) {
printf("( %c )", aTestString[i]); // char als char
printf("[ %02X ]", aTestString[i]); // char in hexadezimal
}
printf("\n123456789\n");
char aTestString2[] = "Käse";
printf("%s has %d characters\n", aTestString2, strlen(aTestString2));
for (int i = 0; i < strlen(aTestString2); i++) {
printf("( %c )", aTestString2[i]); // char als char
printf("[ %02X ]", aTestString2[i]); // char in hexadezimal
}
printf("\n123456789\n");
char aTestString3[] = "λόγος";
printf("%s has %d characters\n", aTestString3, strlen(aTestString3));
for (int i = 0; i < strlen(aTestString3); i++) {
printf("( %c )", aTestString3[i]); // char als char
printf("[ %02X ]", aTestString3[i]); // char in hexadezimal
}
}
For example, what is the recommended way to count the Unicode characters, or to see whether a specific Unicode character (that is, code point) is in the string? I am quite sure there must some simple solution because such characters are often used in passwords for example.
Here the output of the test program:
123456789
cheese has 6 character
( c )[ 63 ]( h )[ 68 ]( e )[ 65 ]( e )[ 65 ]( s )[ 73 ]( e )[ 65 ]
123456789
Käse has 5 characters
( K )[ 4B ]( )[ FFFFFFC3 ]( )[ FFFFFFA4 ]( s )[ 73 ]( e )[ 65 ]
123456789
λόγος has 10 characters
( )[ FFFFFFCE ]( )[ FFFFFFBB ]( )[ FFFFFFCF ]( )[ FFFFFF8C ]( )[ FFFFFFCE ]( )[ FFFFFFB3 ]( )[ FFFFFFCE ]( )[ FFFFFFBF ]( )[ FFFFFFCF ]( )[ FFFFFF82 ]