1

I decided to write a simple example:

#include <iostream>

int main()
{
    std::cout << u8"это строка6" << std::endl;
    return 0;
}

Executed in the console the following command: chcp 65001

Programm output:

��то строка6

Why is the first character is not displayed correctly? I think that the codepage 65001 uses BOM, and read first symbol as BOM. Is this true?

1 Answers1

2

Well the entire standard IO library is dodgy with that code page. Here's another test program (\xe2\x86\x92 is the arrow in UTF-8):

#include <stdio.h>

int main(void)
{
    char s[] = "\xe2\x86\x92 a \xe2\x86\x92 b\n";
    int l = (int) sizeof(s) - 1;
    int wr = fwrite(s, 1, l, stdout);
    printf("%d/%d written\n", wr, l);
    return 0;
}

And its output:

��� a → b
10/12 written

Note that the first character is again replaced by the ��� (it's 3 bytes in UTF-8), and the fwrite call returns the number of characters written on the console. This is a violation of the C standard (it should return the number of bytes), and it will break every program using fwrite or related functions correctly (for instance, try to print "☺☺☺☺☺☺☺☺☺☺☺☺" with Python 3.4).

So your only options to reliably output Unicode text are Windows-specific (unless these issues are fixed in the latest version of MSVC):

Community
  • 1
  • 1
roeland
  • 5,349
  • 2
  • 14
  • 28