0

I've been trying to write a C++ application for a project and I ran into this issue. Basically:

class OBSClass
{
public:
    wstring ClassName;
    uint8_t Credit;
    uint8_t Level;
    
    OBSClass() : ClassName(), Credit(), Level() {}
    OBSClass(wstring name, uint8_t credit, uint8_t hyear)
    : ClassName(name), Credit(credit), Level(hyear)
    {}
};

In some other file:

vector<OBSClass> AllClasses;
...
AllClasses.push_back(OBSClass(L"Bilişim Sistemleri Mühendisliğine Giriş", 3, 1));
AllClasses.push_back(OBSClass(L"İş Sağlığı ve Güvenliği", 3, 1));
AllClasses.push_back(OBSClass(L"Türk Dili 1", 2, 1));
... (rest omitted, some of entries have non-ASCII characters like 'ş' and 'İ')

I have a function basically outputs everything in AllClasses, the problem is wcout does not output as desired.

void PrintClasses()
{
    for (size_t i = 0; i < AllClasses.size(); i++)
    {
        wcout << "Class: " << AllClasses[i].ClassName << "\n";
    }
}

Output is 'Class: Bili' and nothing else. Program does not even tries to output other entries and just hangs. I am on windows using G++ 6.3.0. And I am not using Windows' cmd, I am using bash from mingw, so encoding will not be problem (or isn't it?). Any advice?

Edit: Also source code encoding is not a problem, just checked it is UTF8, default of VSCode

Edit: Also just checked to find out if problem is with string literals.

wstring test;
wcin >> test;
wcout << test;

Entered some non-ASCII characters like 'ö' and 'ş', it works perfectly. What is the problem with wide string literals?

Edit: Here you go

#include <iostream>
#include <string>
#include <vector>

using namespace std;

vector<wstring> testvec;

int main()
{
    testvec.push_back(L"Bilişim Sistemleri Mühendisliğine Giriş");
    testvec.push_back(L"ıiÖöUuÜü");
    testvec.push_back(L"☺☻♥♦♣♠•◘○");
    for (size_t i = 0; i < testvec.size(); i++)
        wcout << testvec[i] << "\n";
    return 0;
}

Compile with G++: g++ file.cc -O3

This code only outputs 'Bili'. It must be something with the g++ screwing up binary encoding (?), since entering values with wcin then outputting them with wcout does not generate any problem.

Zombo
  • 1
  • 62
  • 391
  • 407
Yahya Gedik
  • 97
  • 1
  • 12
  • 1
    Did you remember to save your source code file, specifically the one that has the Unicode string literals, in a Unicode format such as UTF-8 or UTF-16? – selbie Apr 26 '18 at 23:59
  • Yes. I am using VSCode, which has default encoding of UTF8, just checked – Yahya Gedik Apr 27 '18 at 00:02
  • If the issue is with output of a string, why all of this code with vectors and classes to show the problem? Just a 1 line `main` function, simply `std::wcout << L"Your string";`. – PaulMcKenzie Apr 27 '18 at 00:05
  • Its because that '`main` function' works – Yahya Gedik Apr 27 '18 at 00:10
  • 2
    How can outputting the string you're having trouble with work in one function (`main`), and then fail in another if you're outputting the same characters? Sounds like you have a bug, not an issue with outputting a string in general. Post a [mcve], since posting all of this code instead of a simple 1 or 2 line program raises the suspicion that the error has nothing to do with encoding. – PaulMcKenzie Apr 27 '18 at 00:18

2 Answers2

4

The following code works for me, using MinGW-w64 7.3.0 in both MSYS2 Bash, and Windows CMD; and with the source encoded as UTF-8:

#include <iostream>
#include <locale>
#include <string>
#include <codecvt>

int main()
{
    std::ios_base::sync_with_stdio(false);

    std::locale utf8( std::locale(), new std::codecvt_utf8_utf16<wchar_t> );
    std::wcout.imbue(utf8);

    std::wstring w(L"Bilişim Sistemleri Mühendisliğine Giriş");
    std::wcout << w << '\n';
}

Explanation:

  • The Windows console doesn't support any sort of 16-bit output; it's only ANSI and a partial UTF-8 support. So you need to configure wcout to convert the output to UTF-8. This is the default for backwards compatibility purposes, though Windows 10 1803 does add an option to set that to UTF-8 (ref).
  • imbue with a codecvt_utf8_utf16 achieves this; however you also need to disable sync_with_stdio otherwise the stream doesn't even use the facet, it just defers to stdout which has a similar problem.

For writing to other files, I found the same technique works to write UTF-8. For writing a UTF-16 file you need to imbue the wofstream with a UTF-16 facet, see example here, and manually write a BOM.


Commentary: Many people just avoid trying to use wide iostreams completely, due to these issues.

You can write a UTF-8 file using a narrow stream; and have function calls in your code to convert wstring to UTF-8, if you are using wstring internally; you can of course use UTF-8 internally.

Of course you can also write a UTF-16 file using a narrow stream, just not with operator<< from a wstring.

Zombo
  • 1
  • 62
  • 391
  • 407
M.M
  • 138,810
  • 21
  • 208
  • 365
  • 2
    cppreference.com says that `codecvt_utf8` is deprecated since C++17, but it doesn't say anything about what to use instead, so I'm stucking with it for now... – M.M Apr 27 '18 at 04:36
  • Nevermind. For some reason your answer did not work, I managed to fix it by using `setlocale` function. Reference: [Here](https://stackoverflow.com/a/26496567/6624781). Apparently imbuing `wcout` is not enough. – Yahya Gedik Apr 27 '18 at 19:44
  • `codecvt_utf8` is indeed deprecated, should use `codecvt_utf8_utf16` instead. I'll update the answer with it. – Jonathan Dec 06 '18 at 14:48
  • FYI I posted an answer using newer software – Zombo Oct 18 '20 at 03:46
  • @Jonathan cppreference says `codecvt_utf8_utf16` is also deprecated from C++17. [link](https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16) – User 10482 Feb 06 '23 at 15:55
  • @User10482: I don't see a deprecation note in your link, maybe I'm missing something? – Jonathan Feb 06 '23 at 16:04
  • Next to the the signature at the top of the page. Unless the `(deprecated in C++17)` applies to part of the signature. – User 10482 Feb 06 '23 at 16:34
0

If you have at least Windows 10 1903 (May 2019), and at least Windows Terminal 0.3.2142 (Aug 2019). Then set Unicode:

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"OEMCP"="65001"

and restart. After that you can use this:

#include <iostream>

int main() {
   std::string a[] = {
      "Bilişim Sistemleri Mühendisliğine Giriş",
      "Türk Dili 1",
      "İş Sağlığı ve Güvenliği",
      "ıiÖöUuÜü",
      "☺☻♥♦♣♠•◘○"
   };

   for (auto s: a) {
      std::cout << s << std::endl;
   }
}
Zombo
  • 1
  • 62
  • 391
  • 407