I have a file and the line endings are in the windows style \r\n
; it is encoded in USC-2 little endian.
Say this is my file fruit.txt
(USC-2 little endian):
So I open it in a std::wifstream
and try to parse the contents:
// open the file
std::wifstream file("fruit.txt");
if( ! file.is_open() ) throw std::runtime_error(std::strerror(errno));
// create container for the lines
std::forward_list<std::string> lines;
// Add each line to the container
std::wstring line;
while(std::getline(file,line)) lines.emplace_front(wstring_to_string(line));
If I try to print to cout...
// Printing to cout
for( auto it = lines.cbegin(); it != lines.cend(); ++it )
std::cout << *it << std::endl;
...This is what it outputs:
Cherry
Banana
ÿþApple
Worse yet, if I open it in Notepad++, this is what it looks like
I can sort-of rectify this by forcibly converting the encoding back to USC-2 which results in this:
My wstring_to_string
function is defined as this:
std::string wstring_to_string( const std::wstring& wstr ) {
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
return convert.to_bytes(wstr);
}
What in the world is going on here? How can I get a normal UTF-8 string? I have tried this method too: How to read utf-16 file into utf-8 std::string line by line, but imbuing the std::wifstream
first results in no outputs altogether. Can someone please help direct me in the best way to go about converting USC-2 LE data to readable UTF-8 data?
Edit I think there may be a bug with mingw64/mingw-w64-x86_64-gcc 6.3.0-2 which is provided by MSYS2. I have tried everyone's suggestions and imbuing the locale into the streams is just rendering no output at all. I do know there are only two native locales provided, "C" and "POSIX". I was going to try Visual Studio but don't have sufficient internet speed for the 4GB download. I have used ICU like @Andrei R. suggested and it is working great.
I would have loved to use standard libraries but I am ok with this. Please take a look at my code if you need this solution: https://pastebin.com/qudy7yva