I wrote a library to create a crosswords grid, and it works fine (at least as defined) for English words.
However, when I use, for example, Portuguese words like s1 = 'milhão' and s2 = 'sã', if I use 'std::string' the function that tries to find an intersection between s1 and s2 fails. I understood why, as 'ã' is encoded in 2 bytes so the comparison between 's1[4]' and 's2[1]' fails.
If I use 'std::u16string' or 'std::wstring' the function works.
How can I safely compare strings letter by letter, without knowing if the letter is encoded in a single byte or a multi-byte? Should I always use 'std::u32string' if I want my programs to be ready to be used world wide?
The truth is that I never had to worry about localization in my programs, so I am kind of confused.
Here is a program to illustrate my problem:
#include <cstdint>
#include <iostream>
#include <string>
void using_u16() {
std::u16string _str1(u"milhão");
std::u16string _str2(u"sã");
auto _size1{_str1.size()};
auto _size2{_str2.size()};
for (decltype(_size2) _i2 = 0; (_i2 < _size2); ++_i2) {
for (decltype(_size1) _i1 = 0; (_i1 < _size1); ++_i1) {
if (_str1[_i1] == _str2[_i2]) {
std::wcout << L"1 - 'milhão' met 'sã' in " << _i1 << ',' << _i2
<< std::endl;
}
}
}
}
void using_wstring() {
std::wstring _str1(L"milhão");
std::wstring _str2(L"sã");
auto _size1{_str1.size()};
auto _size2{_str2.size()};
for (decltype(_size2) _i2 = 0; (_i2 < _size2); ++_i2) {
for (decltype(_size1) _i1 = 0; (_i1 < _size1); ++_i1) {
if (_str1[_i1] == _str2[_i2]) {
std::wcout << L"2 - 'milhão' met 'sã' in " << _i1 << ',' << _i2
<< std::endl;
}
}
}
}
void using_string() {
std::string _str1("milhão");
std::string _str2("sã");
auto _size1{_str1.size()};
auto _size2{_str2.size()};
for (decltype(_size2) _i2 = 0; (_i2 < _size2); ++_i2) {
for (decltype(_size1) _i1 = 0; (_i1 < _size1); ++_i1) {
if (_str1[_i1] == _str2[_i2]) {
std::cout << "3 - 'milhão' met 'sã' in " << _i1 << ',' << _i2
<< std::endl;
}
}
}
}
int main() {
using_u16();
using_wstring();
using_string();
return 0;
}
As I explained, when calling 'using_string()' nothing is printed.