2

I'm using Visual Studio and C++ on Windows to work with small caps text like ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ using e.g. this website. Whenever I read this text from a file or put this text directly into my source code using std::string, the text visualizer in Visual Studio shows it in the wrong encoding, presumably the visualizer uses Windows (ANSI). How can I force Visual Studio to let me work with UTF-8 strings properly?

std::string message_or_file_path = "...";
auto message = message_or_file_path;

// If the file path is valid, read from that file
if (GetFileAttributes(message_or_file_path.c_str()) != INVALID_FILE_ATTRIBUTES
    && GetLastError() != ERROR_FILE_NOT_FOUND)
{
    std::ifstream file_stream(message_or_file_path);
    std::string text_file_contents((std::istreambuf_iterator<char>(file_stream)),
        std::istreambuf_iterator<char>());
    message = text_file_contents; // Displayed in wrong encoding
    message = "ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in wrong encoding
   std::wstring wide_message = L"ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in correct encoding
}

I tried the additional command line option /utf-8 for compiling and setting the locale:

std::locale::global(std::locale(""));
std::cout.imbue(std::locale());

Neither of those fixed the encoding issue.

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
  • What is the encoding of the .cpp file? – Guillaume Racicot Jan 31 '20 at 20:12
  • Possible duplicate of [How to set standard encoding in Visual Studio](https://stackoverflow.com/questions/696627/how-to-set-standard-encoding-in-visual-studio) – rustyx Jan 31 '20 at 20:16
  • You should open the `std::ifstream` in binary mode to avoid any data conversions while reading the `char`s. That will at least ensure the `std::string` has the correct bytes. That doesn't mean the *IDE* will display it correctly, though. Otherwise, use `std::wstring` instead, as you already discovered. You can read it with a `std::wifstream` that has a UTF-8 locale `imbue()`'ed into it. Or read the raw bytes first and then use `MultiByteToWideChar()` or `std::wstring_convert` to convert the bytes to `std:::wstring` – Remy Lebeau Jan 31 '20 at 20:16

2 Answers2

4

From What’s Wrong with My UTF-8 Strings in Visual Studio?, there are a couple of ways to see the contents of a std::string with UTF-8 encoding.

Let's say you have a variable with the following initialization:

std::string s2 = "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9f\x8d\x8c";

Use a Watch window.

  • Add the variable to Watch.
  • In the Watch window, add ,s8 to the variable name to display its contents as UTF-8.

Here's what I see in Visual Studio 2015.

image

Use the Command Window.

  • In the Command Window, use ? &s2[0],s8 to display the text as UTF-8.

Here's what I see in Visual Studio 2015.

image

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • This may work for the text visualizer but it will not correct the code's encoding so it's only a semi solution. Still, you deserve your upvote. – BullyWiiPlaza Jan 31 '20 at 22:45
  • @BullyWiiPlaza, what do you mean by the "the code's encoding"? – R Sahu Jan 31 '20 at 22:54
  • @R Sahu: I mean during processing of the code the string will not work correctly then. I e.g. copy the unicode text `std::string` object to the clipboard and when I paste it, it's screwed up. With a `std::wstring` version it works fine. – BullyWiiPlaza Feb 02 '20 at 00:09
0

A working solution was simply rewriting all std::strings as std::wstrings and adjusting the code logic properly to work with std::wstrings, as indicated in the question as well. Now everything works as expected.

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185