1

I'm trying to port a C++11 program from Windows to Linux (GCC-4.9). Originally, I just set the locale inside the program

setlocale(LC_ALL, "");

However, it was displaying missing characters on Linux (Latest version of Linux Mint). I then proceeded to save all my source files in the UTF-8 format, which fixed the problem under linux, but now all the characters are messed up in windows.

If that help, the language is french. Is there any ways to correctly display the text under both platforms without too much trouble?

I'd appreciate help, thank you.

void EcranBienvenue()
{
    char coinHG = (char)201;
    char coinHD = (char)187;
    char coinBG = (char)200;
    char coinBD = (char)188;
    char ligneH = (char)205;
    char ligneV = (char)186;
#ifdef _WIN32
    system("cls");
#elif defined __linux__
        system("clear");
#else
        cout << string(20,'\n');
#endif
    setlocale(LC_ALL, "C");
    cout << coinHG;
    for (int i = 0; i < 48; i++)
        cout << ligneH;
    cout << coinHD << endl;
    cout << ligneV << "                                                " << ligneV << endl;
    cout << ligneV << "     Les productions                 inc        " << ligneV << endl;
    cout << ligneV << "                                                " << ligneV << endl;
    cout << ligneV << "     Système de gestion des abonnements         " << ligneV << endl;
    cout << ligneV << "                                                " << ligneV << endl;
    cout << coinBG;
    for (int i = 0; i < 48; i++)
        cout << ligneH;
    cout << coinBD << endl;
    setlocale(LC_ALL, "");

}

It's normal that the border doesn't work on Linux, yet. However, the three lines of text will be displayed accurately on the terminal.

On windows, "è" will be an incorrect character.

Système de gestion des abonnements 
Tristan
  • 1,349
  • 2
  • 16
  • 25
  • 4
    Where are the characters coming from? How are you displaying them? It would help to have a code sample. – M.M Mar 18 '15 at 01:50
  • "it was *displaying missing characters* on Linux" - what a way to phrase it...! – Tony Delroy Mar 18 '15 at 02:03
  • @TonyD Black diamond with question mark in the center – Tristan Mar 18 '15 at 02:11
  • 1
    You'll have to be way more specific than that. One thing, using `-à` in your source code is unreliable; stick to basic ASCII (32 - 127) in your source. – M.M Mar 18 '15 at 02:21
  • @MattMcNabb see the edit. Unfortunately, french required accented letters like "à" and "é" to make sense. – Tristan Mar 18 '15 at 02:33
  • 1
    @Tristan yes but you can't do that by embedding it in your source code – M.M Mar 18 '15 at 02:34

2 Answers2

2

C++ does not define any encoding for (narrow) strings and Windows uses CP-1252 while Linux uses UTF-8. Use std::wstring and std::wcout.

StenSoft
  • 9,369
  • 25
  • 30
  • And use wide-character literals of the form `L"è"`. Although you may need to use some compiler-specific option to get it to read the source properly. – Mark Ransom Mar 18 '15 at 04:06
  • This isn't enough either. Typically what will happen if you write: `std::wcout << L"è";` is that the write will fail and set `wcout` to a bad state. On platforms like Linux you can fix this by imbuing `wcout` with an appropriate locale. On Windows you can try this as well, but you'll be limited to a particular codepage. I would strongly recommend _against_ using wchar_t, wcout, wstring, for portable code; this is really not what wchar is intended for or good at: http://stackoverflow.com/a/11107667/365496 – bames53 Apr 15 '15 at 18:55
1

There are lots of different ways to do this sort of thing, but there are certainly some bad ways. A couple things I strongly recommend avoiding:

  • do not change the global C or C++ locales ever. For the most part just avoid locales altogether.
  • do not use wchar_t (except hidden inside APIs you implement across platforms, use wchar_t only for your Windows implementation).
  • don't use legacy encodings except where absolutely required. (legacy encodings are everything except UTF-8, UTF-32 and UTF-16.

The problems you're seeing are because you're passing text data between interfaces using the wrong encodings.

For example:

Système de gestion des abonnements

This results because you're passing UTF-8 encoded text to an interface that expects data encoded with (probably) Microsoft's codepage 850 (Your console's OEM codepage).

You need to know what encoding an interface requires in order to use it. You also need to know what encoding your data is using. To that end, you should choose a consistent encoding to use in your code, and at interface boundaries convert other data to and from that encoding as necessary. I believe UTF-8 is the best choice for cross platform code.


Due to shortcomings in MSVC's implementation of the standard C and C++ IO facilities, you are probably best off implementing your own IO API with a native Win32 implementation.

Here's a page that talks about implementing output functionality on Windows.

The print function implemented in this article takes wchar_t input. Here's one way to convert UTF-8 to UTF-16/wchar_t:

#include <codecvt>
#include <locale>

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;

std::string str = "Système de gestion des abonnements";
UPrint(convert.from_bytes().c_str());

Additionally you could implement a streambuf that correctly handles writing to Windows' console and replace the streambuf in std::cout with it, so that printing to cout would then print correctly to the console. Remember to restore the original streambuf before exiting so that the destruction of cout's can succeed. You could have a RAII type object handle both setting the stream buffer and switching it back later.

Such a program might look like:

int main() {
  Set_utf8_safe_streambuf buffer_swapper(std::cout); // on windows swaps cout's streambuf with one that can print UTF-8 to the console, does nothing on other platforms

  std::cout << "Système de gestion des abonnements" << '\n'; // utf-8 data
}

Here's an answer with a few details on implementing and swapping a streambuf.

Community
  • 1
  • 1
bames53
  • 86,085
  • 15
  • 179
  • 244