0

As we know wchar_t is 2Bytes on windows but 4bytes on macOS/Linux.

I am trying to read a file that has a Unicode string but that string has read incorrectly (unknown symbols).

basic_ifstream<wchar_t> file("/Documents/file.txt", ios_base::ate); // or wifstream
if(!file.is_open()){
    cout << "Cannot open the file." << endl;
}
streamsize size = file.tellg();
file.seekg(0);
wstring str (size, 0); // or (size / 4, 0)
file.read(reinterpret_cast<wchar_t*>(&str[0]) , size);
file.close();

When debugging that code to see whether the string has read correctly, I found the string is being read incorrectly (unknown symbols).

What is the correct way to read a Unicode file content into wchar_t?

Lion King
  • 32,851
  • 25
  • 81
  • 143

2 Answers2

0

One way to read correctly is:

    #include <iostream>
    #include <fstream> // for wifstream
    #include <codecvt> // for locale & codecvt_utf8
    #include <sstream> // for wstringstream
    using namespace std;

    int main() {
        wifstream wif("file.txt");
        wif.imbue(locale(locale::empty(), new codecvt_utf8_utf16<wchar_t>));
        wstringstream wss;
        wss << wif.rdbuf();
        wstring wstr = wss.str();
        return 0;
    }
gera verbun
  • 285
  • 3
  • 6
-1

The correct way is using a one line Cocoa function that reads the contents of a file as an NSString, taking care of character encoding.

gnasher729
  • 51,477
  • 5
  • 75
  • 98