2

I am working with C++17 in Visual Studio 2019. I have read a fair bit about encodings but I am still not very comfortable with them. I want to output UNICODE characters to screen. For that, I am using the following code

#include <iostream>
#include <fcntl.h>
#include <io.h>

std::wstring symbol{ L"♚" };

_setmode(_fileno(stdout), _O_WTEXT);
std::wcout << symbol; //This works fine
std::cout << "Hello"; //This gives "Debug Assertion Failed! Expression: buffer_size % 2 == 0"
_setmode(_fileno(stdout), O_TEXT); //Need this to make std::cout work normally
std::cout << "World"; //This works fine

So I could do setmode to _O_WTEXT and then back to O_TEXT everytime I need to output the std::wstring. However, I am worried this may be an inefficient way to do things. Is there a better way to do it? I have read about something called native widechar support in C++ but I found it hard to understand. Could anyone illuminate me?

EDIT

To add to the above, using _setmode(_fileno(stdout),_O_U16TEXT) leads to the same behaviour as described above when trying to use std::cout without setting the mode back. If I use _setmode(_fileno(stdout),_O_U8TEXT) instead, my code fails to compile and gives errors 'symbol': redefinition, different basic types and '<<': illegal for class when using std::cout on std::string symbol = <insert any of the possibilities I tried in the snippet below>.

I have been suggested to use a UTF-8 std::string to be able to use std::cout and that way avoid having to switch to wide mode. Could anyone give me a hand on how to achieve this? I have tried

std::string symbol = "\u265A"; //using std::cout gives "?" and triggers warning *
std::string symbol = "♚"; //Same as above

std::string symbol = u8"\u265A"; //using std::cout gives "ÔÖÜ"
std::string symbol = u8"♚"; //Same as above

*Severity Code Description Project File Line Suppression State Warning C4566 character represented by universal-character-name '\u265A' cannot be represented in the current code page (1252)

I have read it may be possible to convert from std::wstring to UTF-8 std::string using WideCharToMultiByte() from the header <Windows.h>. Would that work? Could anyone offer any help?

Luismi98
  • 282
  • 3
  • 14
  • 1
    BTW, unicode is unrelated to wchar/char, you can use utf-8 to display unicode. – Jarod42 Apr 06 '20 at 15:53
  • 1
    Yes you have to switch back and forth. Or you can consistently use `wcout` which works with ASCII, example `std::wcout << "Hello"`. If working with Windows 8 or higher, you can switch to using UTF8 for output, but you still need UTF16 for input. – Barmak Shemirani Apr 06 '20 at 16:43
  • With UTF-8 (i.e. codepage 65001), the CRT + console combination is still less than perfect, even in Windows 10. It may occasionally flush only part of a UTF-8 sequence in a write, which the console will decode as the replacement character, U+FFFD. The best results with the console are still with UTF-16 and the console's wide-character API. Certainly that's true for input. The console does not support reading non-ASCII characters as UTF-8. – Eryk Sun Apr 07 '20 at 19:20
  • FYI, you should do an `fflush` of a high-level `FILE` stream before switching the mode of its low-level fd. – Eryk Sun Apr 07 '20 at 19:21
  • @ErykSun Thank you for commenting. What do you mean with console's wide-character API? Is that the mode setting I describe in the question? I don't know what you mean in the last comment. Why should I do that and what does ```fflush``` mean? What is "fd"? – Luismi98 Apr 08 '20 at 00:28
  • The C runtime `_write` function switches to calling [`WriteConsoleW`](https://learn.microsoft.com/en-us/windows/console/writeconsole) (in the wide-character console API) instead of `WriteFile` if the native file handle that wraps the fd (file descriptor) is a console file and the fd is in wide-character mode. – Eryk Sun Apr 08 '20 at 00:41
  • @ErykSun I am not experienced enough to understand what that means. Do you suggest there is a problem with my code? What do you suggest me to do exactly? – Luismi98 Apr 08 '20 at 00:49
  • As far as I'm concerned, I suggest you could convert the string to UTF-8 and write that to std::cout. The standard streams aren't very good for wide characters.Otherwise, you need to switch back and forth. – Jeaninez - MSFT Apr 13 '20 at 06:25
  • @Jeaninez-MSFT Please see this new [question](https://stackoverflow.com/questions/61196799/how-can-i-convert-a-stdwstring-to-a-utf-8-stdstring) following up on how to convert from std::wstring to a utf-8 std::string. – Luismi98 Apr 13 '20 at 21:09
  • This should just work: `std::cout << "\u265A";` (and it does on my machine). In fact, so does this: `std::cout << "♚"`. Perhaps this is an issue with the configuration of your terminal emulator? – Indiana Kernick Apr 14 '20 at 13:20
  • "...cannot be represented in the current code page (1252)". You could try changing the code page with this: `SetConsoleOutputCP(65001);`. Then try: `std::cout << "\u265A";` – Indiana Kernick Apr 14 '20 at 13:30
  • 1
    @IndianaKernick That solved it! Thank you so much. You can post an answer if you want the bounty. Writing ```std::cout << "\u265A"``` and ```std::cout << "♚"``` still print ```?``` but writing ```std::cout << "u8\u265A``` and ```std::cout << u8"♚"``` now work! – Luismi98 Apr 14 '20 at 13:35

1 Answers1

1

The clue is in the error message. "...cannot be represented in the current code page (1252)". So the code page needs to be changed. The code page identifier for UTF-8 is 65001. To change the code page, use SetConsoleOutputCP.

Indiana Kernick
  • 5,041
  • 2
  • 20
  • 50