Popular software developers and companies (Joel Spolsky, Fog Creek software) tend to use wchar_t for Unicode character storage when writing C or C++ code. When and how should one use char and wchar_t in respect to good coding practices?
I am particularly interested in POSIX compliance when writing software that leverages Unicode.
When using wchar_t, you can look up characters in an array of wide characters on a per-character or per-array-element basis:
/* C code fragment */
const wchar_t *overlord = L"ov€rlord";
if (overlord[2] == L'€')
wprintf(L"Character comparison on a per-character basis.\n");
How can you compare unicode bytes (or characters) when using char?
So far my preferred way of comparing strings and characters of type char in C often looks like this:
/* C code fragment */
const char *mail[] = { "ov€rlord@masters.lt", "ov€rlord@masters.lt" };
if (mail[0][2] == mail[1][2] && mail[0][3] == mail[1][3] && mail[0][3] == mail[1][3])
printf("%s\n%zu", *mail, strlen(*mail));
This method scans for the byte equivalent of a unicode character. The Unicode Euro symbol € takes up 3 bytes. Therefore one needs to compare three char array bytes to know if the Unicode characters match. Often you need to know the size of the character or string you want to compare and the bits it produces for the solution to work. This does not look like a good way of handling Unicode at all. Is there a better way of comparing strings and character elements of type char?
In addition, when using wchar_t, how can you scan the file contents to an array? The function fread does not seem to produce valid results.