5

As the question says, do I have to do in order to print Unicode characters to the output console? And what settings do I have to use? Right now I have this code:

wchar_t* text = L"the 来";
wprintf(L"Text is %s.\n", text);
return EXIT_SUCCESS;

and it prints: Text is the ?.

I've tried to change the output console's font to MS Mincho, Lucida Console and a bunch of others but they still don't display the japanese character.

So, what do I have to do?

Eärendil Baggins
  • 552
  • 1
  • 8
  • 23
  • 1
    The MSVC [man page](https://msdn.microsoft.com/en-us/library/wc7014hz.aspx) says "`printf` does not currently support output into a UNICODE stream." You could try `wprintf` but it is doubtful that a console monospace font will print what you need. – Weather Vane Oct 01 '17 at 12:44
  • Read http://utf8everywhere.org/ – Basile Starynkevitch Oct 01 '17 at 12:45
  • https://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how – Artemy Vysotsky Oct 01 '17 at 12:45
  • Edited the question as I tried what you both said but it still doesn't work. I also changed the console font to MS Mincho which should theoretically display chinese characters but still shows a `?` – Eärendil Baggins Oct 01 '17 at 12:46
  • @ArtemyVysotsky Quoting a comment from that answer: "Note there are serious implementation bugs in Windows's code page 65001 support which will break many applications that rely on the C standard library IO methods, so this is very fragile. (Batch files also just stop working in 65001.) Unfortunately UTF-8 is a second-class citizen in Windows" Definitely not gonna use that – Eärendil Baggins Oct 01 '17 at 12:47
  • 1
    You are not using UTF-8. Widechar output uses UTF-16 on windows – Artemy Vysotsky Oct 01 '17 at 12:48
  • @ArtemyVysotsky Printing the UTF-16 would be fine too. – Eärendil Baggins Oct 01 '17 at 12:50
  • So I would still recommend using UTF-8 and document that choice to your user (who should be responsible in using some UTF-8 command window). – Basile Starynkevitch Oct 01 '17 at 12:50
  • 1
    This answer is most relevant https://stackoverflow.com/a/9051543/8491726 - it was working fro me with most unicode chars... I think chineese were still not visible - due to font limitation - but afer copying to notepad++ I was able to see them as well – Artemy Vysotsky Oct 01 '17 at 12:50
  • @BasileStarynkevitch I need the program to be able to display chinese and japanese characters too, I don't think UTF-8 would be the best choice. – Eärendil Baggins Oct 01 '17 at 12:55
  • @ArtemyVysotsky Doesn't work for me in a C console application. Probably worked with C++ and `iostream` only. – Eärendil Baggins Oct 01 '17 at 12:56
  • Another suggestion: switch to some Linux distribution, it has a much better UTF-8 support than what apparently Windows has. – Basile Starynkevitch Oct 01 '17 at 14:03

4 Answers4

8

This is code that works for me (VS2017) - project with Unicode enabled

#include <stdio.h>
#include <io.h>
#include <fcntl.h>

int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    wchar_t * test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español." ;

    wprintf(L"%s\n", test);
}

This is console

output

After copying it to the Notepad++ I see the proper string

the 来. Testing unicode -- English -- Ελληνικά -- Español.

OS - Windows 7 English, Console font - Lucida Console

Edits based on comments

I tried to fix the above code to work with VS2019 on Windows 10 and best I could come up with is this

#include <stdio.h>
int main()
{
    const auto* test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español.";

    wprintf(L"%s\n", test);
}

When run it "as is" I see Default console settings

When it is run with console set to Lucida Console fond and UTF-8 encoding I see Console switched to UTF-8

As the answer to 来 character shown as empty rectangle - I suppose is the limitation of the font which does not contain all the Unicode gliphs

When text is copied from the last console to Notepad++ all characters are shown correctly

Artemy Vysotsky
  • 2,694
  • 11
  • 20
  • Okay, this works and with MS Mincho I also see the 来 properly on the console. Turns out it got me an error because I tried to use a normal `printf` too after the obscure `_setmode` call. It probably breaks char printing but it's fine as I always print UTF-16 in this project. – Eärendil Baggins Oct 01 '17 at 13:29
  • doesn't work for me with VS2019 until adding const to `wchar_t const * test` – Soup Endless May 13 '20 at 12:55
  • Why 来 isn't displayed correctly when runnig your code? I have this issue too, but other Unicode symbols seems to be displaying correctly. Thanks! – Soup Endless May 13 '20 at 13:00
  • 1
    Updated the answer based on comments from @SoupEndless – Artemy Vysotsky May 13 '20 at 16:58
5

The characters '来' may not be in your system character code page. You need to save the characters as utf-8.

in vs2013, I try this:

// save as utf-8
#pragma execution_character_set( "utf-8" )

#include <Windows.h>

char *s = "the 来";

int main(){
    // set console code page to utf-8
    SetConsoleOutputCP(65001);
    printf("%s\n",s);
    return 0;
}
FrankRx
  • 75
  • 1
  • 5
  • This did the trick for me. A note for future people though, execution_character_set is deprecated. Microsoft recommends either prefixing the string literal with `u8` or adding /execution-charset:utf8 to the compile args. https://learn.microsoft.com/en-us/cpp/preprocessor/execution-character-set?view=msvc-170 – MrZoraman Aug 13 '22 at 06:44
3

A question mark usually means Windows was unable to convert the character to the destination codepage. In the console a hollow square means the Unicode character was received correctly but it could not be displayed because the console font does not support it or it is a complex script requiring Uniscribe which the console does not handle. You can copy the square and paste it in Notepad/Wordpad and it should display correctly.

The WriteConsoleW Windows function can display Unicode characters and works all the way back to Windows NT. It can only write to the console so you must use WriteFile instead when the output is redirected. GetConsoleMode fails on redirected handles.

You don't say which VS version you are using and things have changed over the years but Unicode output has been decent since VS2005 if you call _setmode(_fileno(stdout), _O_U16TEXT); early in main():

#include <stdio.h>
#include <io.h>
#include <fcntl.h>

int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT); // Call this before writing anything

    wchar_t * test = L"the 来" ;
    wprintf(L"Text is %s.\n", test);
    return 0;
}

See also: Myth busting in the console

Anders
  • 97,548
  • 12
  • 110
  • 164
1

This is what worked for me:

#include <locale.h>

and in the main function,

setlocale(LC_ALL, "en_US.UTF-8");
Pnemonic
  • 1,685
  • 17
  • 17