6

I'm having a problem writing French characters to the console in C++. The string is loaded from a file using std::ifstream and std::getline and then printed to the console using std::cout. Here is what the string is in the file:

La chaîne qui correspond au code "TEST_CODE" n'a pas été trouvée à l'aide locale "fr".

And here is how the string is being printed:

La cha¯ne qui correspond au code "TEST_CODE" n'a pas ÚtÚ trouvÚe Ó l'aide locale "fr".

How can I fix this problem?

dda
  • 6,030
  • 2
  • 25
  • 34
jmegaffin
  • 1,162
  • 11
  • 22
  • I assume you're using Windows? – Mark Ransom Nov 15 '12 at 03:55
  • Yes I am, will modify my question to specify. – jmegaffin Nov 15 '12 at 04:00
  • @Boreal: Make sure that you convert your string stored in the file to Unicode UTF-16 (which makes sense as Unicode encoding to be used inside a Windows application). You can do that reading the string from your file and then using `MultiByteToWideChar()` API (or ATL conversion helper `CA2W`) to convert from your specific encoding to UTF-16. Then, to print a Unicode string to console, you just need to initialize the console with `_setmode(_fileno(stdout), _O_U16TEXT);`, and then you can use `wprintf()` or `std::wcout`. See my answer for further details and links. – Mr.C64 Nov 15 '12 at 08:26

2 Answers2

5

The issue is that the console uses different code pages than the rest of the system. For example normally Windows systems set up for the Americas and Western Europe use CP1252, but the console in those regions uses CP437 or CP850.

You can either set the console output code page to match the encoding you're using or you can convert the strings to match the console's output code page.

Set the console output codepage:

SetConsoleOutputCP(GetACP()); // GetACP() returns the system codepage.
std::cout << "La chaîne qui correspond au code \"TEST_CODE\" n'a pas été trouvée à l'aide locale \"fr\".";

Or one of many ways to convert between encodings (this one requires VS2010 or greater):

#include <codecvt> // for wstring_convert
#include <locale>  // for codecvt_byname
#include <iostream>

int main() {
    typedef std::codecvt_byname<wchar_t,char,std::mbstate_t> codecvt;

    // the following relies on non-standard behavior, codecvt destructors are supposed to be protected and unusable here, but VC++ doesn't complain.
    std::wstring_convert<codecvt> cp1252(new codecvt(".1252"));
    std::wstring_convert<codecvt> cp850(new codecvt(".850"));

    std::cout << cp850.to_bytes(cp1252.from_bytes("...été trouvée à...\n")).c_str();
}

The latter example assumes you do in fact need to convert between 1252 and 850. You should probably use the function GetOEMCP() to figure out the actual target code page, and the source codepage actually depends on what you use for the source code rather than on the result of GetACP() on the machine running the program.

Also note that this program relies on something not guaranteed by the standard: that the wchar_t encoding be shared between locales. This is true on most platforms—usually some Unicode encoding is used for wchar_t in all locales—but not all.


Ideally you could just use UTF-8 everywhere and the following would work fine, as it does on other platforms these days:

#include <iostream>

int main() {
    std::cout << "La chaîne qui correspond au code \"TEST_CODE\" n'a pas été trouvée à l'aide locale \"fr\".\n";
}

Unfortunately Windows can't support UTF-8 this way without either abandoning UTF-16 as the wchar_t encoding and adopting a 4 byte wchar_t, or violating requirements of the standard and breaking standard conforming programs.

bames53
  • 86,085
  • 15
  • 179
  • 244
  • When you say set the output code page of the console, how would I go about doing that? – jmegaffin Nov 15 '12 at 04:01
  • @Boreal, use the `chcp` command to show the current code page or to set it to something else. – Mark Ransom Nov 15 '12 at 04:02
  • I agree with this answer in principle, but the code pages don't seem to match up. I can't find a combination that agrees with the sample input and output. – Mark Ransom Nov 15 '12 at 04:03
  • @MarkRansom It looks to me like the mangling matches up with an input using CP1252 and output using CP850 – bames53 Nov 15 '12 at 04:09
  • Do the world a favor and use `CP_UTF8` as the parameter. (Assuming it works. I haven't tested it.) – asveikau Nov 15 '12 at 04:14
  • @asveikau Unfortunately CP_UTF8 doesn't work on a console window without also changing how the output is written. Writing UTF-8 with `std::cout` doesn't work (at least up to Windows 7) because the implementation of std::cout outputs one byte at a time; so each UTF-8 code unit is converted individually which means the console thinks it's getting a bunch of illegal encodings instead of a single valid multibyte sequence. Printing UTF-8 _does_ work if you're willing to write everything using `puts()`, because Microsoft's implementation of `puts()` doesn't split things up. – bames53 Nov 15 '12 at 04:23
  • @asveikau Oh, and even if you do write the strings correctly you still have to coerce VC++ into producing UTF-8 string literals, which isn't as straightforward as it ought to be. – bames53 Nov 15 '12 at 04:28
  • If the C++ library splits things and that is a problem for the Win32 layer then that is also a problem. `CP_UTF8` isn't the only "multi-byte code page" (Microsoft's term) where chars are variable length and possibly more than 1 byte. – asveikau Nov 15 '12 at 05:06
  • @asveikau CP_UTF8 is the only one that exhibits this problem though. Other multibyte codepages can be converted one byte at a time I think due to the use of a shift state. UTF-8 could be handled that way, but isn't. – bames53 Nov 15 '12 at 05:09
3

If you want to write Unicode characters in the console, you have to do some initialization:

_setmode(_fileno(stdout), _O_U16TEXT);

Then your French characters are displayed correctly (I've tested it using Consolas as my console font):

#include <fcntl.h>
#include <io.h>

#include <iostream>
#include <ostream>
#include <string>

using namespace std;

int main() 
{
    // Prepare console output in Unicode
    _setmode(_fileno(stdout), _O_U16TEXT);


    //
    // Build Unicode UTF-16 string with French characters
    //

    // 0x00EE - LATIN SMALL LETTER I WITH CIRCUMFLEX
    // 0x00E9 - LATIN SMALL LETTER E WITH ACUTE
    // 0x00E0 - LATIN SMALL LETTER A WITH GRAVE

    wstring str(L"La cha");
    str += L'\x00EE';
    str += L"ne qui correspond au code \"TEST_CODE\" ";
    str += L"n'a pas ";
    str += L'\x00E9';
    str += L't';
    str += L'\x00E9';
    str += L" trouv";
    str += L'\x00E9';
    str += L"e ";
    str += L'\x00E0';
    str += L" l'aide locale \"fr\".";


    // Print the string to the console
    wcout << str << endl;  
}

Consider reading the following blog posts by Michael Kaplan:

Moreover, if you are reading some text from a file, you have to know which encoding is used: UTF-8? UTF-16LE? UTF-16BE? Some specific code page? Then you can convert from the specific encoding to Unicode UTF-16 and use UTF-16 inside a Windows application. To convert from some code page (or from UTF-8) to UTF-16 you can use MultiByteToWideChar() API, or ATL conversion helper class CA2W.

Community
  • 1
  • 1
Mr.C64
  • 41,637
  • 14
  • 86
  • 162
  • finally following your advice got é è ñ ç â ... thanks! I still wonder this is so little explained spanish is as I think the second language in this ball! – Anxon Pués Nov 11 '18 at 12:54
  • I don't have _fileno nor _O_U16TEXT. Where can I get them? – ploosu2 Feb 07 '19 at 15:40
  • @ploosu2 They are part of the Visual C++’s CRT. You can look them up in the MSDN documentation. – Mr.C64 Feb 07 '19 at 23:54