2

I'm trying to print this medium shade unicode box in C: ▒

(I'm doing the exercises in K&R and then got sidetracked on the one about making a histogram...). I know my unix term (Mac OSX) can display the box because I saved a text file with the box, and used cat textfilewithblock and it printed the block.

So far I initially tried:

#include <stdio.h>
#include <wchar.h>

int main(){
  wprintf(L"▒\n");
  return 0;
}

and nothing printed

iMac-2$ ./a.out 
iMac-2:clang vik$

I did a search and found this: unicode hello world for C?

And it seems like I still have to set a locale (even though the executing environment in utf8? I'm still trying to figure out why this step is necessary) But anyway, it works! (after a bit of a struggle finally realizing that the proper string was en_US.UTF-8 rather than en_US.utf8 which I had read somewhere...)

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(){
  setlocale (LC_ALL, "en_US.UTF-8");
  wprintf(L"▒\n");
  return 0;
}

Output is as follows:

iMac-2$ ./a.out 
▒
iMac-2$

But when I try the following code...putting in the UTF-8 hex (which I got from here: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=9472&unicodeinhtml=dec ) which is 0xe29692 for the box rather than pasting the box in itself, it doesn't work again.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(){
  setlocale (LC_ALL, "en_US.UTF-8");
  wchar_t box = 0xe29692;
  wprintf(L"%lc\n", box);
  return 0;
}

I'm clearly missing something but can't quite figure out what it is.

royhowie
  • 11,075
  • 14
  • 50
  • 67
  • I should note, I'm using the following compiler command: `cc --std=c11` – RandomUser762 Jan 22 '16 at 01:15
  • might be helpful: http://stackoverflow.com/q/12017342/2476755 – royhowie Jan 22 '16 at 01:27
  • On Mac OSX, you don't need `wchar.h`, `wprintf`, or the `L` prefix, or the `setlocale`. If you want to print a box, just print a box: `printf("▒\n");` – user3386109 Jan 22 '16 at 01:32
  • @user3386109: it actually depends on the text editor: it must be configured to save the source file as `UTF-8`. – chqrlie Jan 22 '16 at 01:41
  • @chqrlie True, I forget sometimes that not everybody uses Xcode on a MAC. – user3386109 Jan 22 '16 at 01:43
  • 1
    Use `setlocale(LC_ALL, "");` to set the default locale. Use `setlocale(LC_ALL, "C");` to set the C locale. All other names are implementation defined. Your second example works on my Mac with the empty string for the locale name (as it did with the full name you gave). I can also affirm I got no output with your original code. – Jonathan Leffler Jan 22 '16 at 01:55
  • @JonathanLeffler: isn't it a sad choice that `setlocale(LC_ALL, "");` is not the default setting? It should have been possible to only deal with locale translation if any of the multibyte APIs have been invoked and stay in compatibility otherwise. – chqrlie Jan 22 '16 at 02:03
  • @chqrlie: Yes, no, maybe. Given the state of the world when the C89 standard was published, if an alternative decision had been made, it would probably have sunk the standard as every existing C program would have had to be edited to add `setlocale(LC_ALL, "C");` as the first statement. Breaking existing code was something that the C standard committee carefully avoided. The consequence is that you have to add `setlocale(LC_ALL, ""):` to new code that you want to run in the user's locale. I think the original decision was correct in 1989; the consequence is inevitable a quarter century later. – Jonathan Leffler Jan 22 '16 at 09:56

2 Answers2

5

The unicode value of the MEDIUM SHADE code point is not 0xe29692, it is 0x2592. <E2><96><92> is the 3 byte encoding for this code point in UTF-8.

You can print this thing either using the wide char APIs:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
    setlocale(LC_ALL, "en_US.UTF-8");
    wchar_t box = 0x2592;
    wprintf(L"%lc\n", box);  // or simply printf("%lc\n", box);
    return 0;
}

Or simply by printing the UTF-8 encoding directly:

#include <stdio.h>

int main(void) {
    printf("\xE2\x96\x92\n");
    return 0;
}

Or if your text editor encodes the source file in UTF-8:

#include <stdio.h>

int main(void) {
    printf("▒\n");
    return 0;
}

But be aware that this will not work: putchar('▒');

Also for full unicode support and a few more goodies, I recommend using iTerm2 on MacOS.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    The Mac `Terminal.app` supports Unicode just fine; you can even select the encoding to use via different profiles. – nneonneo Jan 22 '16 at 01:54
  • @nneonneo: Yes that's true, but it used to be less complete, I use `iTerm2` for a different reason: it allow more control for remapping modifier keys, which I use a lot, especially when editing locally and remotely with my own version of emacs `;-)` – chqrlie Jan 22 '16 at 01:58
  • Ohhhhh I get it now! Thanks! The utf-8 "encoding" is a string of up to 4 bytes not a single 1 to 4 byte value. – RandomUser762 Jan 22 '16 at 07:41
  • 1
    Answer #3 is the correct one. If your editor doesn’t support UTF-8 then you need a new editor (yes, I know, you’ve been using it for 30 years — time to start using modern tools). For those who wonder why `put char('▒’);` doesn’t work, it’s because it is designed to output a single ASCII byte, not a UTF-8 character (this is where bytes and characters confuses people). – Lloyd Sargent Jan 03 '20 at 17:19
  • I guess I have to wonder what is “non-portable” about Unicode? C compilers don’t care what the editor produces as long as it looks like bytes. – Lloyd Sargent Jan 04 '20 at 21:23
  • The compiler doesn’t care. The following const will be stored: 0xE2 0x96 0x92 0x0a 0x00 in both cases. What you are being confused about is how the EDITOR stores data. If you are using a UTF-8 editor, then you will always get ▒ (assuming you are using an OS that understands UTF-8). The compiler doesn’t care. It’s just a string. – Lloyd Sargent Jan 06 '20 at 00:18
  • @chqrlie UTF-8 is not a wide character format. This is what I think is confusing you. It’s just a string of bytes. A “character” (glyph) may be 1-4 bytes with additional diacritics (which are also 1-4 bytes). – Lloyd Sargent Jan 06 '20 at 00:22
  • The compiler doesn’t care. It never has cared. The compiler does not check for “valid ascii character” strings. I know, I did my masters work in compiler design (and have written a few). You are mangling a lot of ideas to make your point, but you really should read the Unicode standard and learn more about UTF-8 before attempting to argue what will and won’t compile. – Lloyd Sargent Jan 08 '20 at 01:53
  • @LloydSargent: You are correct, it is useless arguing since you know everything. – chqrlie Jan 08 '20 at 03:01
2

The box character is U+2592, which translates to 0xE2 0x96 0x92 in UTF-8. This adaptation of your third program mostly works for me:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
    setlocale (LC_ALL, "en_US.UTF-8");
    wchar_t box = 0xe29692;
    wprintf(L"%lc\n", box);
    wprintf(L"\n\nX\n\n");
    box = L'\u2592'; //0xE2 0x96 0x92 = U+2592
    wprintf(L"%lc\n", box);
    wprintf(L"\n\n0x%.8X\n\n", box);
    box = 0x2592;
    wprintf(L"%lc\n", box);
    return 0;
}

The output I get is:

X

▒


0x00002592

▒

The first print operation produces nothing of use; the others work.

Testing on Mac OS X 10.10.5. I happen to be compiling with GCC 5.3.0 (which I compiled), but I got the same output with XCode 7.0.2 and clang.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278