3

While testing some functions to convert strings between wchar_t and utf8 I met the following weird result with Visual C++ express 2008

std::wcout << L"élève" << std::endl;

prints out "ÚlÞve:" which is obviously not what is expected.

This is obviously a bug. How can that be ? How am I suppose to deal with such "feature" ?

dan04
  • 87,747
  • 23
  • 163
  • 198
chmike
  • 20,922
  • 21
  • 83
  • 106

5 Answers5

12

The C++ compiler does not support Unicode in code files. You have to replace those characters with their escaped versions instead.

Try this:

std::wcout << L"\x00E9l\x00E8ve" << std::endl;

Also, your console must support Unicode as well.

UPDATE:

It's not going to produce the desired output in your console, because the console does not support Unicode.

Dave Van den Eynde
  • 17,020
  • 7
  • 59
  • 90
  • Unfortunately, using Dave's code yields exactly the same output. So I guess it means that the shell doesn't support unicode. – chmike Apr 06 '09 at 13:30
  • It seems I should be able to activate UTF-8 support in the shell by issuing the command chcp 65001. How can I do this from within a program before writing out things ? – chmike Apr 06 '09 at 13:42
  • It's not going to output the full UTF-16. You're lucky if you get ANSI output, because the high order bytes are knocked off. But the characters are ANSI page 1252 compatible. – Dave Van den Eynde Apr 06 '09 at 14:04
2

I found these related questions with useful answers Is there a Windows command shell that will display Unicode characters? How can I embed unicode string constants in a source file?

Community
  • 1
  • 1
chmike
  • 20,922
  • 21
  • 83
  • 106
1

This is obviously a bug. How can that be?

While other operating systems have dispensed with legacy character encodings and switched to UTF-8, Windows uses two legacy encodings: An "OEM" code page (used at the command prompt) and an "ANSI" code page (used by the GUI).

Your C++ source file is in ANSI code page 1252 (or possibly 1254, 1256, or 1258), but your console is interpreting it as OEM code page 850.

dan04
  • 87,747
  • 23
  • 163
  • 198
1

You might also want to take a look at this question. It shows how you can actually hard-code unicode characters into files using some compilers (I'm not sure what the options would be got MSVC).

Community
  • 1
  • 1
jkp
  • 78,960
  • 28
  • 103
  • 104
0

You IDE and the compiler use the ANSI code page. The console uses the OEM code page.

It also matter what are you doing with those conversion functions.

Mihai Nita
  • 5,547
  • 27
  • 27