1

I'm learning C++ and I cannot figure out how how to print special characters in C++. Even when I've seen others post related to this issue, any of them solves this:

I just want to print out this chars -> '♦', '♥', '♣', '♠';

And when I do this ->

std::cout << '♦' << std::endl;

It prints out a number such as 14850470, but when I pass the char to a function, such as ->

char foo(char a) 
{
    return a;
}
int main()
{
    std::cout << foo('♦') << std::endl;
}

It prints out 'ª' instead, any ideas?

(I'm writing this in VSCode with the MSVC compiler on Windows.)

EDIT: The answers solved my problem (I executed chcp 65001 on the CL). But I have to change this std::cout << '♠' << std::endl; to this std::cout << "♠" << std::endl; in order to work, since printing as char prints nothing on the console.

Davichete
  • 415
  • 7
  • 14
  • 1
    This question has already been marked as a duplicate, so new answers cannot be added. However, I thought it useful to explain the reported behavior. My guess is that the source file is UTF-8 encoded, but that the Microsoft compiler is invoked without either the `/utf-8` or `/source-charset:utf-8` options. In this case, the compiler will interpret the source file according to the active code page, probably Windows-1252. The ♦ character is encoded as E2 99 A6 in UTF-8. If interpreted as Windows-1252, this makes `'♦'` a multi-character literal with type `int`. – Tom Honermann Sep 03 '19 at 02:59
  • 1
    The value of multi-character literals is implementation defined (http://eel.is/c++draft/lex.ccon#2). The Microsoft compiler implements them via a logical left shift of the current literal value (initially 0) by 8 bits followed by a logical-or of the current code unit. So, the series of code units E2 99 A6 in `'♦'` (when interpreted as Windows-1252) results in an `int` value of `0x00E299A6` or 14850470. – Tom Honermann Sep 03 '19 at 03:09
  • 1
    So what you have going on here is typical mojibake. In addition to having to run `chmod 65001` to set the console encoding, you should also compile with (minimally) the `/source-charset:utf-8` option so that the source code is read as UTF-8 by the compiler, and possibly with `/execution-charset:utf-8` so that the compiler doesn't transcode the (utf-8) literals read from the source code to the ACP (the default execution character encoding). But be careful with the `/execution-charset:utf-8` option as it doesn't affect the actual run-time encoding! – Tom Honermann Sep 03 '19 at 03:14
  • Great explanation, thank you so much! I've tried running those commands along with my executable, but the result is the same, only the `"♣"` got printed out, printing `'♣'` prints a number as you said, and `u8"♣"` prints another character, in my case, `â™ ` – Davichete Sep 03 '19 at 09:40
  • 1
    If `u8"♣"` prints something different, that is again because mojibake is occurring; either due to a mismatch of encoding expectations for your source file, for the execution encoding, or for the console. Again, I suspect what is happening is that your source file is UTF-8 encoded, and you are compiling without the `/source-charset:utf-8` or `/utf-8` options. The contents of the literal will then be interpreted according to the ACP. Assuming Windows-1252, the UTF-8 bytes E2 99 A3 would produce a string containing `"♣"`. I'm not sure why the last character isn't printing "right" for you. – Tom Honermann Sep 03 '19 at 21:56
  • @Tom Oh, I was running those commands along with the executable, now I have run them along with the compile, and it works! Now I get the expected character printing `"♣"` and `u8"♣"`. Thank you! – Davichete Sep 03 '19 at 22:58

2 Answers2

2

There is no portable way.

On Linux/mac, the terminal recently adopted UTF-8 as default. So, when you output a UTF-8 binary to the standard output, you can see the '♦' character.

On Windows 10 1607 or later, chcp 65001 will work fine except when printing color emoji. Since the broken font config dialog is fixed and you can use a TrueType font for chcp 65001 console. In your program, you should output a UTF-8 binary to the standard output. Before you run your program, run chcp 65001 and configure the font.

On windows before Windows 10 1607, you should give up trying to print Unicode.


On C++20, the C++ standard committee adopt char8_t that's assumed to hold UTF-8. In the future, C++ (C++26? C++29?), when <locale> and <iostream> is recreated and support char8_t, you can printf a Unicode character portably.


In my opinion, you should give up trying to print Unicode characters. Create a GUI using some library which supports TrueType fonts and OpenType.

yumetodo
  • 1,147
  • 7
  • 19
2

Assuming you're happy to have your code only work in Windows I think the answer to your question can be found here:

How to use unicode characters in Windows command line?

The answer doesn't go into a lot of detail, this should help: https://learn.microsoft.com/en-us/windows/console/low-level-console-output-functions

This way might be quicker and easier though: How to print Unicode character in C++?

David Oldford
  • 1,127
  • 4
  • 11