2

When i'm trying to convert string to lowercase, non-english characters are converted to �

Take string from first file, convert it to lowercase and print in second file.

Let's say first line would be:

1.Testing Strings For Conversion

2.Test Des Chaînes Pour La Conversion (French google translate)

main(){
    ifstream old_file;
    ofstream new_file;
    string str;

    old_file.open("first.txt");
    new_file.open("second.txt");

    getline(old_file, str);
    transform(str.begin(), str.end(), str.begin(), ::tolower);
    new_file << str;

    old_file.close()
    new_file.close()
}

Result will be:

1.testing strings for conversion

2.test des cha�nes pour la conversion

Input file encoding "UTF-8 with BOM"

  • use this: https://en.cppreference.com/w/cpp/locale/tolower (but its complicated to get this always right) – m2j May 11 '23 at 11:09
  • I'd use the [ICU](https://icu.unicode.org/design/cpp) library. – Eljay May 11 '23 at 11:11
  • 1
    Added encoding info "UTF-8 with BOM" – Mefisto1029 May 11 '23 at 11:18
  • With your current setup which uses `C-locale` there are no characters other then ASCII. You need to select some encoding so standard library can understand how to handle none ASCII characters. – Marek R May 11 '23 at 11:18
  • @m2j `tolower` won't work as it only operates on single characters so will fail with multi-byte encodings like utf-8. Nothing in the standard library will help you here, you'll need an external unicode library – Alan Birtles May 11 '23 at 11:19
  • 4
    Death to "UTF-8 with BOM". BOMs have no place in UTF-8 text. – Botje May 11 '23 at 11:19
  • 2
    [UTF-8 Everywhere](http://utf8everywhere.org/), no BOMs! – Eljay May 11 '23 at 11:21
  • The only simple and reliable way to do it is use wide characters: https://coliru.stacked-crooked.com/a/b2e337f04ea1de9b https://godbolt.org/z/Wvv3TYzv9 Note that setting imbue on stream defines its encoding, so you need to: `std::wifstream f{"your_file.txt"}; f.imbue(std::locale{"C.UTF-8"});`. Remember to set global locale. – Marek R May 11 '23 at 11:51

0 Answers0