I'm working on a project wherein the case sensitive operations needs to be replaced with case insensitive operations. After doing some reading on this, the type of data to be considered are:
- Ascii characters
- Non-ascii characters
- Unicode characters
Please let me know if I've missed anything in the list.
Do the above need to be handled separately or are there libraries for C++ which can handle them all without concerning the type of data?
Specifically:
Does the boost library provide support for this? If so, are there sample examples or documentation on how to use the APIs?
I learned about IBM's International Components of Unicode (ICU). Is this a library that provides support for case insensitive operations? If so, are there sample examples or documentation on how to use the APIs?
Finally, which among the aforementioned (and other) approaches is better and why?
Thanks!
Based on the comments and answers, I wrote a sample program to understand this better:
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
using namespace std;
void ascii_to_lower(string& str)
{
std::locale loc;
std::cout << "Ascii string: " << str;
std::cout << "Lower case: ";
for (std::string::size_type i=0; i<str.length(); ++i)
std::cout << std::tolower(str[i],loc);
return;
}
void non_ascii_to_lower(void)
{
std::locale::global(std::locale("en_US.UTF-8"));
std::wcout.imbue(std::locale());
const std::ctype<wchar_t>& f = std::use_facet<std::ctype<wchar_t> >(std::local
std::wstring str = L"Zoë Saldaña played in La maldición del padre Cardona.";
std::wcout << endl << "Non-Ascii string: " << str << endl;
f.tolower(&str[0], &str[0] + str.size());
std::wcout << "Lower case: " << str << endl;
return;
}
void non_ascii_to_upper(void)
{
std::locale::global(std::locale("en_US.UTF-8"));
std::wcout.imbue(std::locale());
const std::ctype<wchar_t>& f = std::use_facet<std::ctype<wchar_t> >(std::local
std::wstring str = L"¥£ªÄë";
std::wcout << endl << "Non-Ascii string: " << str << endl;
f.toupper(&str[0], &str[0] + str.size());
std::wcout << "Upper case: " << str << endl;
return;
}
int main ()
{
string str="Test String.\n";
ascii_to_lower(str);
non_ascii_to_upper();
non_ascii_to_lower();
return 0;
}
The output is:
Ascii string: Test String. Lower case: test string.
Non-Ascii string: ▒▒▒▒▒ Upper case: ▒▒▒▒▒
Non-Ascii string: Zo▒ Salda▒a played in La maldici▒n del padre Cardona. Lower case: zo▒ salda▒a played in la maldici▒n del padre cardona.
The non-ascii string, though seems to get converted to upper and lower case, some of the text is not visible in the output. Why is this?
On the whole, does the sample code look ok?