0

I'm not quite understand how std::setlocale works.

Here is my simple program

// main.cpp

#include <iostream>
#include <clocale>

int main(void) {
    std::string str = u8"Привет, мир";
    std::cout << str << std::endl;

    setlocale(LC_ALL, ".UTF8");
    std::cout << str << std::endl;

    return 0;
}

Prerequisties

The code is compiled in Visual Studio 2022 (Version 17.3.6), with CL version 19.33.31630.

Program is running in Windows 10 (21H2 19044.2728) in PowerShell terminal with CP1251 encoding.

PS> $PSVersionTable.PSVersion.ToString()
5.1.19041.2673

PS> [Console]::OutputEncoding

IsSingleByte      : True
BodyName          : koi8-r
EncodingName      : Кириллица (Windows)
HeaderName        : windows-1251
WebName           : windows-1251
WindowsCodePage   : 1251
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : False
CodePage          : 1251 

Here the result of execution:

PS D:\VS\Projects\playground\x64\Debug> .\playground.exe
Привет, мир
Привет, мир

Question

The first line gibberish is ok, I've expected that.

But why the second line (after setlocale call) is not gibberish?

As far as I understand setlocale documentation this function affects only locale-dependent function, like std::toupper, std::isalpha, etc. There are no mentions about changing stdout encoding at all.

I thought that std::cout just put bytes from std::string to stdout, but it seems it has smarter behaviour.

It seems that std::cout checks terminal encoding, and if it has other encoding that is set by setlocale automatically convert bytes from "program locale" to "terminal locale".

Is this behaviour cross-platform and described in standard?

Pavel.Zh
  • 437
  • 3
  • 15
  • Related: https://stackoverflow.com/a/63454192/4641116 – Eljay Apr 07 '23 at 15:51
  • 1
    `setlocale` has no effect on any terminal encoding. `setlocale` instructs the C++ program what it's locale is, for the purposes of producing output. Hopefully it matches the terminal's actual encoding in which case everything looks ok. If not, gibberish may result. The End. – Sam Varshavchik Apr 07 '23 at 15:55
  • @SamVarshavchik, here is the thing. I got gibberish for the first time. It was expected, because terminal has CP1251. Than I've set locale to "UTF-8" which obviously does not match terminal's encoding and print the same string and got fine output. But the string is **the same**. So it seems like `std::cout` just silently convert my string from `utf-8` to `cp1251` or `setlocale` has changed terminal encoding. – Pavel.Zh Apr 07 '23 at 16:05
  • You may also need `SetConsoleOutputCP( 65001 );`, as per several other StackOverflow answers. – Eljay Apr 07 '23 at 16:16
  • Another several pages said `_setmode(_fileno(stdout), _O_U8TEXT);`. This seems to be a endemic problem. – Eljay Apr 07 '23 at 16:34
  • @Eljay, thank you for your help, but the main problem is not fixing the gibberish. It is all about why `setlocale` fixes it. It is not logical, but your link in the first comment was helpful. It is part of microsoft-specific behaviour, when `setlocale` enables interpreting `const char*` as UTF-8 in their internal API functions. – Pavel.Zh Apr 07 '23 at 17:08

0 Answers0