32

I'm still learning C++, so bear with me and my sloppy code. The compiler I use is Dev C++. I want to be able to output Unicode characters to the Console using cout. Whenver i try things like:

#include <iostream>

int main()
{
    std::cout << "Hello World!\n";
    std::cout << "Blah blah blah some gibberish unicode: ĐĄßĞĝ\n";
    system("PAUSE");
    return 0;
}

It outputs strange characters to the console, like µA■Gg. Why does it do that, and how can I get to to display ĐĄßĞĝ? Or is this not possible with Windows?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
Jesse Foley
  • 321
  • 1
  • 3
  • 3
  • 7
    just a comment: don't use system("pause"), its very bad practice. you can use cin instead. http://www.gidnetwork.com/b-61.html – nmuntz May 17 '10 at 12:40
  • Duplicate? http://stackoverflow.com/questions/2492077/output-unicode-strings-in-windows-console-app – Éric Malenfant May 17 '10 at 12:54
  • 24
    Oh god, how do people DO it? How come every newbie is magnetically attracted to Dev C++? That piece of junk was buggy 5 years ago, and guess what? It still is today, **because it hasn't been maintained since then**. There are so many **good** free compilers and IDEs. Why oh why do beginners insist on picking the only one that is absolute crap, lacks basic features, never worked, and is buggy as hell and comes with a prehistoric compiler by default? – jalf May 17 '10 at 13:17
  • 9
    @jalf: your rant would be more useful if you linked to one such good, free compiler and IDE. – Joachim Sauer May 17 '10 at 13:40
  • 5
    @nmuntz: I agree about `system("pause");` but the article you link to is just as bad. For one thing, just `cin.get()` does **not** usually suffice. Pausing does a whole lot more, most prominently cleaning the input buffer. Doing that in a portable, reliable way in C++ is **extremely** hard. In fact, the two solutions I know (ignore 1– `cin.rdbuf()->in_avail()`, 2– `numeric_limits::max()`) fail on different current compilers (they compile but don’t work). The rest of the linked page is a straw-man argument. Who cares that pausing is costly? It’s only called once! – Konrad Rudolph May 17 '10 at 13:45
  • 8
    @Joachim: Fair enough. Microsoft has Visual C++ Express, which includes an excellent compiler and IDE for free. That is pretty much the de facto standard for Windows C++ development. GCC is a top-notch cross-platform compiler, and is often used with the Code::Blocks or Eclipse IDEs. – jalf May 18 '10 at 02:34
  • @JoachimSauer : yes, including everything said by jalf, I actually use MinGW in Windows... works like a charm. – kumarharsh Oct 09 '12 at 18:12
  • I answered a very similar questions just a few days ago. It is very detailed and with example: [Unicode on Console - Chinese Characters](https://stackoverflow.com/a/49479764/2099297) Although my answer focuses on Windows 10 back to Vista, but it is already 2017 now. – David Mar 29 '18 at 06:12
  • 2
    Possible duplicate of [Output unicode strings in Windows console app](https://stackoverflow.com/questions/2492077/output-unicode-strings-in-windows-console-app) – phuclv Jul 28 '18 at 10:15
  • Check my answer on this post https://stackoverflow.com/questions/2492077/output-unicode-strings-in-windows-console-app/54833872#54833872 – Joma Feb 22 '19 at 19:34

5 Answers5

19

What about std::wcout ?

#include <iostream>

int main() {
    std::wcout << L"Hello World!" << std::endl;
    return 0;
}

This is the standard wide-characters output stream.

Still, as Adrian pointed out, this doesn't address the fact cmd, by default, doesn't handle Unicode outputs. This can be addressed by manually configuring the console, like described in Adrian's answer:

  • Starting cmd with the /u argument;
  • Calling chcp 65001 to change the output format;
  • And setting a unicode font in the console (like Lucida Console Unicode).

You can also try to use _setmode(_fileno(stdout), _O_U16TEXT);, which require fcntl.h and io.h (as described in this answer, and documented in this blog post).

Community
  • 1
  • 1
Tyn
  • 2,184
  • 1
  • 13
  • 20
  • This doesn't address the fact that the console is typically in ANSI or OEM mode. – Adrian McCarthy May 17 '10 at 13:21
  • 1
    This is mostly right but... `cmd` does handle Unicode output by default to the console but not when redirected to a file. Use `/u` for it to also output Unicode to redirected files. In both cases "Unicode" means `UTF-16` as per usual on Windows. `chcp 65001` sets the `ANSI` codepage to `UTF-8` which is unlreated to wide characters, `wcout`, and `cmd /u`. You do not need to set the codepage to UTF-8 to output UTF16!! Furthermore the `WriteFile()` API is broken under `chcp 65001`. The `_setmode()` call is important and required if you want to output characters beyond your ANSI codepage! – hippietrail Apr 18 '11 at 06:43
  • @Adrian: The console does not have an ANSI or OEM mode. It ony has an ANSI codepage which by default is an OEM codepage such 437 or 850. But you do not have to print via this codepage. All Windows text APIs have an `A` version and a `W` version. `A` for ANSI which goes through the codepage, `W` for "wide" which does not go through the codepage but deals directly in UTF-16 Unicode. Both are always present without a requirement or even a possibility of switching a "mode". – hippietrail Apr 18 '11 at 06:48
  • +1 for your suggestion to set a "unicode font in the console". That was the missing piece for me. I thought that doing chcp 65001 alone would enable a unicode font. – George Hernando Jun 05 '19 at 23:14
10

You can use the open-source {fmt} library to portably print Unicode text, including on Windows, for example:

#include <fmt/core.h>

int main() {
  fmt::print("Blah blah blah some gibberish unicode: ĐĄßĞĝ\n");
}

Output:

Blah blah blah some gibberish unicode: ĐĄßĞĝ

This requires compiling with the /utf-8 compiler option in MSVC.

I don't recommend using wcout because it is non-portable, for example:

std::wcout << L"Blah blah blah some gibberish unicode: ĐĄßĞĝ\n";

will print the ĐĄßĞĝ part incorrectly on macOS or Linux (https://godbolt.org/z/z81jbb):

Blah blah blah some gibberish unicode: ??ss??

and doesn't even work on Windows without changing the code page:

Blah blah blah some gibberish unicode:

Disclaimer: I'm the author of {fmt}.

vitaut
  • 49,672
  • 25
  • 199
  • 336
  • What does `/utf-8` do? Isn't Windows UTF-16 internally? Isn't it inefficient to convert to UTF-16 at runtime? The whole Win32 is UTF-16, how do you get around that? – Aykhan Hagverdili Dec 28 '20 at 06:05
  • 1
    From the documentation (https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-160) I read that if you don't specify `/utf-8` it will use the user locale code page, which will mean your program might display differently based on the locale settings of the user who compiled the program and maybe also on the locale during execution (yikes!) I don't use windows any more, so take this with a grain of salt, but the link might help you. – Ferdi265 Dec 28 '20 at 11:16
  • 1
    `/utf-8` sets source and execution encoding to UTF-8. Technically it's unnecessary but {fmt} won't do transcoding otherwise in case you are using a legacy encoding. Transcoding only happens when writing to console and is negligible compared to the time it takes to render the text. When the output is redirected there is no transcoding which is another advantage of `{fmt}` compared to `wcout`. – vitaut Dec 28 '20 at 14:24
  • @vitaut Thank you for the extensive comment. A bit of a provoking question: In that case, why not make UTF-8 default in `{fmt}` and let people who are dealing with legacy encoding go the extra mile (when switching to `{fmt}` presumably) instead of literally everyone else? – Aykhan Hagverdili Dec 28 '20 at 18:32
  • 2
    @AyxanHaqverdili, good question. I'm considering switching the default to UTF-8 in the next major version with the opt-out to the old behavior. – vitaut Dec 28 '20 at 18:34
7

I'm not sure Windows XP will fully support what you need. There are three things you have to do to enable Unicode with a command console:

  1. Start the command window with cmd /u. The /u says your programs will output Unicode.
  2. Use chcp 65001 to indicate you want to use UTF-8 instead of one of the code pages.
  3. Select a font with more glyph coverage. The command windows in newer versions of Windows offer Lucida Console Unicode. My XP box has a subset of that called Lucida Console. It doesn't have a very extensive repertoire, but it should be sufficient if you're just trying to display some accented characters.
Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • 1
    +1 for use chcp 65001 - this does the trick. (from cmd /? : /U Causes the output of internal commands to a pipe or file to be Unicode.) – mr calendar Feb 06 '11 at 23:46
  • 3
    1. `/u` only means that built in commands will output UTF-16 when redirected rather than ANSI. It means nothing for your own code or for output that is not redirected. 2. `chcp 65001` does not work properly with UTF-8 console output due to a bug in the `WriteFile()` API which causes it to return the wrong value. This API is called by the standard C library functions such as `printf()` and any of them which check the return code may fail or result in unpredictable behaviour. 3. The font advice is correct and is a silly failing of Windows IMHO. – hippietrail Apr 18 '11 at 06:35
0

You used the ANSI output stream. You need to use

std::wcout << L"Blah blah blah some gibberish unicode: ĐĄßĞĝ\n";

Also, use std::cin.get(), not system("PAUSE")

William Miller
  • 9,839
  • 3
  • 25
  • 46
Puppy
  • 144,682
  • 38
  • 256
  • 465
  • 1
    Thanks for the tip about cin.get(). I know using system("PAUSE"); is a bad habit, but Dev C++ didn't support anything else I used. Also, wcout isn't recognized by Dev C++. I think I'll follow the advice in the other answers/comments and switch to Visual Studio. I experience fewer problems with that IDE. – Jesse Foley May 21 '10 at 12:13
  • It should also be noted that `system("PAUSE")` is not portable whereas `cin.get()` is. – kfoxon Jan 30 '15 at 16:57
-1

In Linux, I can naively do:

std::cout << "ΐ , Α, Β, Γ, Δ, ,Θ , Λ, Ξ, ... ±, ... etc";

and it worked for most of the characters I tried.

quanta
  • 215
  • 3
  • 14