0

Here I have a simple, exemplary code in MS Visual Studio:

#include<string>
#include<iostream>

using namespace std;

int main()
{
   cout << static_cast<int>('ą') << endl; // -71
   return 0;
}

The question is why this cout prints out -71 as if MS Visual Studio was using Windows 1250 if as far as I know it uses UTF-8?

Huuulk99
  • 21
  • 3
  • Does this answer your question? [Output unicode strings in Windows console app](https://stackoverflow.com/questions/2492077/output-unicode-strings-in-windows-console-app) – Christof Wollenhaupt May 30 '22 at 09:32

1 Answers1

0

Your source file is saved in Windows-1250, not UTF-8, so the byte stored between the two single quotes is 0xB9 (see Windows-1250 table). 0xB9 taken as a signed 8-bit value is -71.

Save your file in UTF-8 encoding and you'll get a different answer. I get 50309 which is 0xc485. since UTF-8 is a multibyte encoding, it would be better to use modern C++ to output the bytes of an explicit UTF-8 string, use UTF-8 source encoding, and tell the compiler explicitly that the source encoding it UTF-8:

test.c - saved in UTF-8 encoding and compiled with /utf-8 switch in MSVS:

#include<string>
#include<iostream>
#include <cstdint>

using namespace std;

int main()
{
    string s {u8"ą马"};
    for(auto c : s)
        cout << hex << static_cast<int>(static_cast<uint8_t>(c)) << endl;
    return 0;
}

Output:

c4
85
e9
a9
ac

Note C4 85 is the correct UTF-8 bytes for ą and E9 A9 AC are correct for Chinese 马 (horse).

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251