3

I recently discovered that on Windows 10/11 there is a beta testing option under region settings (system locale) to "Use Unicode UTF-8 for worldwide language support". When this is enabled, all the ANSI Win32 system calls treat string as UTF-8. Sure enough, if enabled you can compile the following in MSVC:

int main() {
    std::cout << "Hello, World! こんにちは世界!" << std::endl;
    //prints "Hello, World! こんにちは世界!"
}

I then read that you don't have to enable this system-wide and can instead compile your program with the /utf-8 flag. So with the beta option disabled and the /utf-8 flag added to my project:

int main() {
    std::cout << u8"Hello, World! こんにちは世界!" << std::endl;
    //prints "Hello, World! こんにちは世界!"
}

and

int main() {
    setlocale(LC_ALL, "en_US.utf-8");
    std::cout << "Hello, World! こんにちは世界!" << std::endl;
    //prints "Hello, World! ???????!"
}

I also tried adding u8 to the string literal, but it makes no difference.

Chris_F
  • 4,991
  • 5
  • 33
  • 63
  • You're conflating two completely different things. One tells the compiler to use UTF-8 for the source code, the other tells the system to use UTF-8 for I/O. You probably want to use both of them for consistent output. – Mark Ransom Jun 23 '23 at 02:35
  • @MarkRansom Well, enabling the beta option is potentially a bad idea, since it can change the behavior of programs that aren't expecting it. Plus it requires a user to go into advances setting and enable an unstable feature. This is the kind of thing you'd want to be able to compile into a program. – Chris_F Jun 23 '23 at 03:37
  • All the program can do is output bytes. It is up to the OS how it wants to interpret those bytes and turn them into characters. A compiler switch isn't going to affect that. – Mark Ransom Jun 23 '23 at 12:12
  • 1
    @Mark A compiler flag determines whether or not your application opens in a console window or not. It's not remotely inconceivable to embed something to tell the OS to use UTF-8. – Chris_F Jun 24 '23 at 03:57
  • Oh sure, it's possible. Python 3.6 proved it by completely bypassing the byte-oriented console I/O with the native Windows Unicode API, sidestepping the problem neatly. But to the best of my knowledge the same has never been done for C++. – Mark Ransom Jun 24 '23 at 04:10
  • As I replied before, the option modifies the registry and it is still a beta option for **4 years**. `SetConsoleOutputCP` sets the output code page of terminal. But for the GUI, I did not find the related functions in the Win SDK. To achieve the above effect, you need to change the registry in the program or use Unicode Api. – Minxin Yu - MSFT Jun 26 '23 at 02:08
  • You can use an app manifest to enable UTF-8 in Win32 ANSI APIs on a per-application basis instead of system-wide. But you also need to make sure your ANSI strings are actually UTF-8 encoded, which is where things like the `/utf-8` flag and the `u8` prefix come into play in your source code. – Remy Lebeau Jun 29 '23 at 19:42
  • @RemyLebeau apparently setting UTF-8 in the manifest file only affects the ANSI code page, which doesn't apply to the console. So you need to add the manifest and also use `SetConsoleOutputCP(CP_UTF8)`. The problem is that even if you do all that, unicode input through the console will still be broken. So at best, this is like a 70% solution to UTF-8 in Windows. – Chris_F Jun 29 '23 at 19:48
  • @Chris_F good point. And there are TONS of questions on StackOverflow regarding Unicode/UTF-8 I/O on the console. – Remy Lebeau Jun 29 '23 at 19:50
  • "It's not remotely inconceivable to embed something to tell the OS to use UTF-8" Indeed that might seem like a good idea, but that's not what happens in practice. – n. m. could be an AI Jun 29 '23 at 20:02
  • "it can change the behavior of programs that aren't expecting it" These are exactly the programs you don't want to touch with a six foot pole, so nothing of value is lost. – n. m. could be an AI Jun 29 '23 at 20:07
  • @n.m.willseey'allonReddit Except that is what happens in practice, as we've already discussed. The application manifest tells Windows to give the process a UTF-8 ANSI code page if embedded in the executable. – Chris_F Jun 29 '23 at 20:35
  • A compiler flag and the application manifest are two vey, very, VERY different things. – n. m. could be an AI Jun 29 '23 at 20:57
  • Hi, @Chris_F I updated my answer. Add `#pragma execution_character_set("utf-8")` and use the manifest can make the MessageBoxA display Japanese. – Minxin Yu - MSFT Jul 04 '23 at 06:09
  • @MarkRansom It works fine with C++, just not with MSVC. – n. m. could be an AI Jul 04 '23 at 07:08
  • @n.m.willseey'allonReddit which version of C++ does that then? This is the first I've heard of it. – Mark Ransom Jul 04 '23 at 12:49
  • @MarkRansom gcc with UCRT or Cygwin runtime. – n. m. could be an AI Jul 04 '23 at 15:09

1 Answers1

0

Use #pragma execution_character_set("utf-8") and SetConsoleOutputCP(CP_UTF8), eg:

#include<iostream>

#include <Windows.h>
#pragma execution_character_set("utf-8")

int main() {
    SetConsoleOutputCP(CP_UTF8);
    std::cout << "Hello, World! こんにちは世界!" << std::endl;       
}

image

Update:

As Remy Lebeau said, you can use app manifest.

yourapp.manifest:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="Microsoft.Windows.Common-Controls" version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>

Add the manifest in Visual Studio Project properties -> Manifest Tools -> Input and Output -> Additional Manifest Files: yourapp.manifest

Or Visual Studio Command Prompt :

mt.exe -manifest yourapp.manifest -outputresource:yourapp.exe;#1

enter image description here

Minxin Yu - MSFT
  • 2,234
  • 1
  • 3
  • 14
  • This works for console output, but `MessageBoxA` for instance doesn't work. – Chris_F Jun 23 '23 at 03:32
  • But there is already MessageBoxW for unicode.`::MessageBoxW(m_hWnd, L"Hello, World! こんにちは世界", L"", NULL);;` – Minxin Yu - MSFT Jun 23 '23 at 05:41
  • 1
    The scope of this question was using Window's new UTF-8 for ANSI functionality so that you don't have to perform UTF-8 > UTF-16 conversions. – Chris_F Jun 23 '23 at 05:45
  • From the link: [What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?](https://stackoverflow.com/questions/56419639/what-does-beta-use-unicode-utf-8-for-worldwide-language-support-actually-do) the beta option modified the registry option: ACP, MACCP, and OEMCP. The checkbox forces them to UTF-8 (codepage 65001). I am able to print the characters correctly after modifying. – Minxin Yu - MSFT Jun 23 '23 at 07:58
  • [Use an app manifest](https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page) to enable UTF-8 behavior in `A` APIs, then `MessageBoxA()` will interpret the string as UTF-8 (provided it is actually UTF-8 encoded properly). – Remy Lebeau Jun 29 '23 at 19:44