As a exercise, I am making a simple vocabulary trainer. The file I am reading contains the vocabulary, which also includes special characters like äöü for example.
I have been struggling to read this file, however, without getting mangled characters instead of the approperate special characters.
I understand why this is happening but not how to correctly solve it.
Here is my attempt:
Unit(const char* file)
:unitName(getFileName(file),false){
std::wifstream infile(file);
std::wstring line;
infile.imbue(std::locale(infile.getloc(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::consume_header>()));
while (std::getline(infile, line))
{
std::wcout<<line.c_str()<<"\n";
this->vocabulary.insert(parseLine(line.c_str(),Language::EN_UK,Language::DE));
}
}
The reading process stops as soon as a entry is reached that contains a special character.
I have even been able to change the code slightly to see where exactly it stops reading:
while (infile.eof()==false)
{
std::getline(infile, line);
std::wcout<<line.c_str()<<"\n";
this->vocabulary.insert(parseLine(line.c_str(),Language::EN_UK,Language::DE));
}
If I do it like this, the output loops the entry with the special character but stops outputting it right before the special character would appear like so:
Instead of:
cross-class|klassenübergreifend
It says:
cross-class|klassen
cross-class|klassen
cross-class|klassen
cross-class|klassen
.
.
.
this leads me to believe that the special character gets misinterpreted as a line end by getline.
I do not care if I have to use getline or something else, but in order for my parse function to work, the string it gets needs to represent a line in the file. Therefore reading the entire buffer into a string wont work, unless I do the seperation myself.
How can I properly and neatly read a utf-8 file line by line?
Note: I looked for other articles on here but most of them either use getline or just explain why but not how to solve it.