I have tried many ways to do it.. using scanf()
, getc()
, but nothing worked. Most of the time, 0 is stored in the supplied variable (maybe indicating wrong input?). How can I make it so that when the user enters any Unicode codepoint, it is properly recognized and stored in either a string or a char?

- 555,201
- 31
- 458
- 770

- 2,860
- 1
- 11
- 35
-
1You probably want `fgetwc` – William Pursell Mar 29 '21 at 13:16
-
1It's not even sure your terminal supports those characters. And how do you input the ``character anyway – Jabberwocky Mar 29 '21 at 13:17
-
1Maybe related: https://stackoverflow.com/q/4588897/10553341 – Damien Mar 29 '21 at 13:19
-
3@Damien more related: https://stackoverflow.com/q/66849407/995714 – phuclv Mar 29 '21 at 13:26
-
@Jabberwocky most modern OSes have a way to input emojis. Like `Win+.` on Windows and `Cmd+Ctrl+Space` on macOS – phuclv Mar 29 '21 at 13:27
-
@phuclv didn't know about `Win+.` thanks for the hint. – Jabberwocky Mar 29 '21 at 13:28
-
@phuclv Effectively. I have even some difficulty in understanding the difference... – Damien Mar 29 '21 at 13:33
-
I tried.... but I couldn't get the banana into the USB. Got nothing but a mess `:)` – David C. Rankin Mar 29 '21 at 14:21
1 Answers
I'm guessing you already know that C chars and Unicode characters are two very different things, so I'll skip over that. The assumptions I'll make here include:
- Your C strings will contain UTF-8 encoded characters, terminated by a
NUL
(\x00
) character. - You won't use any C functions that could break the per-character encoding, and you will use output (
strlen()
, etc) with the understanding you need to differentiate between C chars and real characters.
It really is as simple as:
char input[256];
scanf("%[^\n]", &input);
printf("%s\n", input);
The problems comes with what is providing the input, and what is displaying the output.
#include <stdio.h>
int main(int argc, char** argv) {
char* bananna = "\xF0\x9F\x8D\x8C\x00";
printf("%s\n", bananna);
}
This probably won't display a banana. That's because the UTF-8 sequence being written to the terminal isn't being interpreted as a UTF-8 sequence.
So, the first thing you need to do is to configure your terminal. If your program is likely to only use one terminal type, then you might even be able to do this from within the program; however, there are tons of people who use different terminals, some that even cross Operating System boundaries. For example, I'm testing my Linux programs in a Windows terminal, connected to the Linux system using SSH.
Once the terminal is configured, your probably already correct program should display a banana. But, even a correctly configured terminal can fail.
After the terminal is verified to be correctly configured, the last piece of the puzzle is the font. Not all fonts contain glyphs for all Unicode characters. The banana is one of those characters that isn't typically typed into a computer, so you need to open up a font tool and search the font for the glyph. If it doesn't exist in that font, you need to find a font that implements a glyph for that character.

- 555,201
- 31
- 458
- 770

- 69,361
- 7
- 100
- 138
-
1Most of the time, when you get the console configured correctly, the input to your program is correctly sent as UTF-8 bytes. That's when you can dump the Hexidecimal of the bananna input in a printf to see if you got the right character. – Edwin Buck Mar 29 '21 at 14:20