0

I am reading http://utf8everywhere.org/#how.files.

Never pass std::string or const char* filename arguments to the fstream family. MSVC CRT does not support UTF-8 arguments, but it has a non-standard extension which should be used as follows: ...
We will have to manually remove the conversion, when MSVC’s attitude to fstream changes.

emphasize mine.

Do the emphasized text indicate that the implementation of fstream family differs between platforms? Does it indicate that under Linux system I can safely pass utf-8 std::string or char to fstream family? Why the C++ documentation does not notice that platform difference http://en.cppreference.com/w/cpp/io/basic_fstream?

Newbie do need some confirmation so I would really feel appreciated if you can answer my questions directly while adding some explanation.

Rick
  • 7,007
  • 2
  • 49
  • 79
  • 1
    Because cppreference follows the Standard, and MSVC doesn't give a damn about the Standard :) – Quentin May 31 '18 at 11:46
  • 3
    C++'s language specification does not guarantee support for Unicode. Window uses UTF-16 internally anyway. – Dai May 31 '18 at 11:47
  • This referenced HowTo is too outdated: *Last modified: 1970-01-01* – 273K May 31 '18 at 11:51
  • @S.M. I think that's a joke? Since it mentions Windows NT (1993) or something else and UTF-8 is created around 1993. – Rick May 31 '18 at 11:53
  • 2
    The implementation of the whole standard library differs between platforms. – melpomene May 31 '18 at 12:01
  • The standard doesn't know a thing about the encoding of file names, so it's squarely implementation-defined. Linux treats file names as opaque NUL-terminated, `/` separated binary blobs, Windows pretty much the same but with 16 bit characters (nominally it's UTF-16, but correctness is not enforced). The "narrow-string" APIs that are provided on Win32 work in the "local code page" (to perform the conversion to UTF-16), which however cannot be reliably set to UTF-8, so if a path contains a character that cannot be represented in the local CP, you cannot specify it through regular `fopen`. – Matteo Italia May 31 '18 at 12:07
  • Background reading: https://stackoverflow.com/questions/50613451/the-proper-way-to-handle-unicode-with-c-in-2018 and https://stackoverflow.com/questions/17103925/how-well-is-unicode-supported-in-c11 – Richard Critten May 31 '18 at 12:16
  • 1
    utf8everywhere.org - encourages you to use UTF-8 as your internal string encoding, but acknowledges you will need to translate at your program's boundary. This includes file names as the native OS may not support UTF-8. – Richard Critten May 31 '18 at 12:22
  • std::wfstream works fine on Windows. Making code hard to port does seem to be intentional sometimes. – Hans Passant May 31 '18 at 13:08
  • 1
    The implementation of any or all parts of the C++ standard library potentially differs between platforms. Compiler vendors are like that - they compete on optimising their compiler and their library in different ways that they hope suits programmers better than products of their competitors. The standard requires that the resultant behaviour is consistent (except where behaviour is undefined, unspecified, implementation-defined, etc) but does not require specific means of implementation. – Peter May 31 '18 at 14:05
  • @RichardCritten Come on man, I know that and I am using C++, not C. After all, thanks for your comments and anybody else's (really useful for me). – Rick May 31 '18 at 16:39

1 Answers1

1

First of all, the standard library is provided as part of a compiler environment. That means that since there are many compiler editors, there are many different implementations of the standard library. Simply all are required to respect the standard. But yes Microsoft and gcc come with different implementations of the standard library. That being said, you can install a gcc full development environment with MinGW, and are not stuck in MSVC even on Windows.

But, the problem on displaying UTF8 text is not only related the the stream library implementation but to the underlying terminal windows. The part that cannot correctly display UTF-8 text is the Windows console that hosts the cmd.exe shell. And that part does not depend on the development tool used.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252