1

If I read special characters from a file and then try to compare them (like with an if) it doesn't recognize them.

    std::wstring c;
    std::wifstream file;
    file.open("test.txt");
    while (file)
    {
        wchar_t tmp = file.get();
        c += tmp;
    }
    file.close();

    size_t l = c.length();

    for (int i = 0; i < l; i++)
    {
        wchar_t a = c[i];

        if (a == L'ä') {
            std::cout << "if triggered.";
        }
    }

But when I create a wchar and predefine a special character it does work.

wchar_t a = L'ä';

if (a == L'ä') {
    std::cout << "if triggered";
}

and if I put the wstring that was loaded from the file, in the file I get the text back. Nothing weird happens there.

Strox
  • 29
  • 4
  • 1
    What encoding is the file you are reading (test.txt) and what encoding is the source file with the character? – KompjoeFriek May 19 '22 at 19:27
  • What encoding does your file use? What OS are you using? – NathanOliver May 19 '22 at 19:27
  • 1
    Print out the numeric value for the character and see what it is, you'll probably see a difference between `std::cout << (int)c[i] << "\n";` and `std::cout << (int)L'ä' << "\n";` – PeterT May 19 '22 at 19:30
  • The code is using (i assume) UTF-16 encoding while the txt file uses UTF-8. And when did the int conversion it did produce different numbers. – Strox May 19 '22 at 19:36
  • 1
    If your file is UTF8, then you need a UTF8 library to read it. – NathanOliver May 19 '22 at 19:38
  • A bit of shameless self-promotion (and, hopefully, a solution to your problem): https://stackoverflow.com/a/51356708/5743288 – Paul Sanders May 19 '22 at 22:21

1 Answers1

0

This depends on the kind of file encoding. I would implicitly say that, in this case, UTF-8.
The code below may be work fine:

std::string str;
{
    std::ifstream file;
    file.open("D:/test.txt");
    file >> str;
}

wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
wstring wstr = myconv.from_bytes(str);

size_t l = wstr.length();

for (int i = 0; i < l; i++)
{
    auto a = wstr[i];
    if (a == L'ä') {
        std::cout << "if triggered.";
    }
}

However, std::codecvt_utf8 is deprecated in C++17

For the cases using higher C++17:

By MSVC++
I recommend using CString, it's too easy and worked on every almost version of C++, follow this:

std::string str;
{
    std::ifstream file;
    file.open("D:/test.txt");
    file >> str;
}

CString wstr = (CString)CA2W(str.c_str(), CP_UTF8);
size_t l = wstr.GetLength();

for (int i = 0; i < l; i++)
{
    auto a = wstr[i];
    if (a == L'ä') {
        std::cout << "if triggered.";
    }
}

#include <atlstr.h> for non-MFC

zpc
  • 47
  • 3