Handling Unicode characters in C and NCURSES

Question

I'm trying to display some unicode characters in a C program. A working MWE can be seen below:

#include <ncurses.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <locale.h>


int main(int argc, char *argv[]) 
{ 
    setlocale(LC_ALL, "");
    initscr();              // Initialize stdscr

    for(int x = 0; x < 20; x++)
    {
        switch (x%5)
        {
            case 0:
                mvaddstr(1, x, "\u2588");
                break;
            case 1:
                mvaddstr(1, x, "\u2593");
                break;
            case 2:
                mvaddstr(1, x, "\u2592");
                break;
            case 3:
                mvaddstr(1, x, "\u2591");
                break;
            case 4:
                mvaddstr(1, x, " ");
                break;
        }
    }

    mvprintw(3, 0, "Press ANY KEY to finish");
    refresh();
    int ch = getch();
    endwin();

    return 0;
}

To compile use gcc -o shades shades.c -lncursesw. It compiles fine and shows the shades correctly, as we can see in image below.

But instead using a case/switch statement I would like put my characters into an array of hexadecimal codes and iterate over it. As the shameful attempt below.

#include <ncurses.h>
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main(int argc, char *argv[]) 
{ 
    setlocale(LC_ALL, "");
    initscr();              // Initialize stdscr

    uint shades[5] = { 0x2588,
                       0x2593,
                       0x2592,
                       0x2591,
                       ' '};

    char utfchar[7];

    for(int x = 0; x < 20; x++)
    {
        sprintf(utfchar, "\\u%04x", shades[x%5]);
        mvaddstr(1, x, utfchar);
    }

    mvprintw(3, 0, "Press ANY KEY to finish");
    refresh();

    int ch = getch();
    endwin();

    return 0;
}

Here I'm using sprintf to convert the hexadecimal value into a string formatted as \u0000 where 0000 are the correct hexadecimal value. Then I use mvaddstr as I did in previous code, since mvaddstr expects a const char * in third argument.

This is one of the many failed attemps. I'm not being able to copy the strings correctly in unicode format, neither being able to use a variable as argument to mvaddstr when I try to add unicode content.

I would like to know how can I format the unicode capable const char * from a uint valid unicode hex value to insert it into the mvaddstr?

PS: I'm not using C++ just plain C in Linux. C++ solutions are not a solution

Maybe change the array to `const char * shades[5] = { "\u2588", "\u2593", "\u2592", "\u2591", ""};` — 001, May 15 '20 at 05:52
Parsing of escape sequences in strings and characters (like `"\u2588"`) is done at *compile-time*, by the compiler itself. You can't create escape sequences at run-time. As for what e.g. `"\u2588"` is doing, it's simply inserts the hexadecimal number `0x2588` into the string at that point (byte order indeterminate). — Some programmer dude, May 15 '20 at 05:52
There is no function in C to encode a hex value into a valid unicode? Anyhow the idea to use a `const char` array works. But since the character values are semi-dinamically generated in my code. I would like to be able to *encode* hex values into valid unicode characters at run time. — Lin, May 15 '20 at 06:01
You can use `wchar` functions as long as your system and locale expect `wchar` to hold a unicode code point. Ncurses has wide character support; see https://stackoverflow.com/questions/15222466/display-wchar-t-using-ncurses, for example. — rici, May 15 '20 at 06:27

score 1 · Accepted Answer · answered May 15 '20 at 06:04

You could simply put the strings in your array:

const char *shades[] = { "\u2588",
                         "\u2593",
                         "\u2592",
                         "\u2591",
                         " "};

for(int x = 0; x < 20; x++)
{
    mvaddstr(1, x, shades[x%4]);
}

If you want to do it with codepoints, you need to encode it as UTF8 (or anything NCurse expects):

void sprintutf8(char *buffer, uint32_t code)
{
    if (code < 0x80)
        sprintf(buffer, "%c", code);
    else if (code < 0x800)
        sprintf(buffer, "%c%c",
            0xC0 | (code >> 6),
            0x80 | (code & 0x3F));
    else
        sprintf(buffer, "%c%c%c",
            0xE0 | (code >> 12),
            0x80 | (code >> 6 & 0x3F),
            0x80 | (code & 0x3F));
}

[...]

for(int x = 0; x < 20; x++)
{
    sprintutf8(utfchar, shades[x%4]);
    mvaddstr(1, x, utfchar);
}

I don know about intern details of unicode specifications. Is this code machine endian safe? There aren't any standard C functions to handle unicode encodings? — Lin, May 15 '20 at 06:10
this code is endian safe as it uses bitwise operators. there was attempts to get widechar in C which was IMHO not well handled and i'd recommend not going that route (search about wchar_t if you want to anyway) — blld, May 15 '20 at 06:13

score 1 · Answer 2 · answered May 15 '20 at 06:34

You can simply use wctomb casting with wchar_t to convert from hex to unicode:

uint shades[5] = { 0x2588,
                   0x2593,
                   0x2592,
                   0x2591,
                   ' '};

char utfchar[MB_CUR_MAX];

for(int x = 0; x < 20; x++)
{
    memset(utfchar, 0, sizeof utfchar);
    wctomb(utfchar, (wchar_t)shades[x % 5]);
    mvaddstr(1, x, utfchar);
}

Handling Unicode characters in C and NCURSES

2 Answers2