1

I'm having troubles understanding write() in order to print unicode caracters, unicode in the UTF-8 i'm using uses 3 bytes, so with an array of 3 char there is no problem to print them, this prints the character 'Ƹ':

#include <locale.h>
#include <unistd.h>

int     main(void)
{
    setlocale(LC_ALL, "en_US.UTF-8");
    char uni[3] = {0x00, 0xC6, 0xB8};
    write(1, uni, 3);
    return (0);
}

The question is: if wchar_t is also 3 bytes long, and write just prints the number of bytes given by argument why the following code does not work?

#include <locale.h>
#include <wchar.h>
#include <unistd.h>

int     main(void)
{
    setlocale(LC_ALL, "en_US.UTF-8");
    wchar_t uni = L'\xC6B8';
    write(1, &uni, sizeof(wchar_t));
    return (0);
}

I've tryied also to initialice wchar_t like this: wchar_t uni = 0xC6B8; and the result is the same just two unprintable characters (��).

  • So much confusion... :/ `wchar_t` is very unlikely to be 3 bytes long on your machine. UTF-8 is a variable length encoding. Sequence of `wchar_t`s is *not* using UTF-8. `L'\xC6B8'` has the same binary representation as `"\xC6\xB8"` only on machines with big endian integer representation. – milleniumbug Dec 08 '17 at 23:55
  • mmm okay, thanks, so with the little endian how do i should initialize the wchar_t? Because if i do `wchar_t uni = '\xC6\xB8';` i get a `error: multi-character character constant [-Werror=multichar]` – latiagertrutis Dec 09 '17 at 00:19
  • `printf("\xC6\xB8")` should work on *nix. Or are you using Windows? – Barmak Shemirani Dec 09 '17 at 00:24
  • I'm on linux, i mean to initialize wchar_t not with the unicode symbol it self but with something i can operate before (hexadecimal number for example). – latiagertrutis Dec 09 '17 at 00:33
  • 1
    Don't use `wchar_t` to store a code point. The width of the type is implementation specific, and, in fact, *can't store a code point* on Windows. – milleniumbug Dec 09 '17 at 00:41
  • So a n-size char array would work well for this purpose? n depending on wich code point i'm using. – latiagertrutis Dec 09 '17 at 00:49

1 Answers1

0
setlocale(LC_ALL, "en_US.UTF-8");
char uni[3] = {0x00, 0xC6, 0xB8};
write(1, uni, 3);

Instead of above code , please use following code

setlocale(LC_ALL, "en_US.UTF-8");
write(1, "Ƹ", 3);

Understanding and writing wchar_t in C

nexdev
  • 195
  • 11