4

When I tried tolower() with non english charecters in c++ it's not working normally. I searched that issue and I came across something about locale but I am not sure about best solution of that.

My sample code is below:

printf("%c ",tolower('Ü'));
Kyle Hale
  • 7,912
  • 1
  • 37
  • 58
Yavuz
  • 1,257
  • 1
  • 16
  • 32
  • Mentioned in Bart van Ingen Schenau's answer, but worth highlighting: *characters cannot generally be assumed to have lower/upper case variants*. Read up on [letter case](http://en.wikipedia.org/wiki/Letter_case). – unwind Dec 13 '12 at 12:49

2 Answers2

6

Unfortunately, the standard C++ library does not have sufficient support for changing the case of all possible non-English characters (in so far as those characters that have case variants at all). This limitation is caused by the fact that the C++ standard assumes that a single character and its case variants occupy exactly one char object (or wchar_t object for wide characters) and for non-English characters that is not guaranteed to be true (also depending on how the characters are coded).

If your environment uses a single-byte encoding for the relevant characters, this might give you what you want:

std::cout << std::tolower('Ü', locale());

With wide characters, you will probably have more luck:

std::wcout << std::tolower(L'Ü', locale());

but even that won't give the correct result for toupper(L'ß'), which would be the two-character sequence L"SS").

If you need support for all characters, take a look at the ICU library, in particular the part about case mappings

Bart van Ingen Schenau
  • 15,488
  • 4
  • 32
  • 41
  • Is that solution for only Windows or same for Ubuntu? Because I tried them but codes not working. – Yavuz Dec 13 '12 at 12:44
  • @Yavuz: It should also work on Ubuntu, but it might depend on your locale setting. See the answer by Konrad Rudolph for how you can easily use a non-default locale. – Bart van Ingen Schenau Dec 13 '12 at 16:59
  • I agree with what's said here minus the 'ß' conversion. The standard says that that should *not* convert to a "SS" nor to a 'ẞ': http://stackoverflow.com/a/37571371/2642059 – Jonathan Mee Jun 01 '16 at 18:45
3

Like Bart has shown, C++ simply doesn’t like multi-byte encodings. Luckily, you can use Boost.Local to solve this without too much hassle. Here’s a simple example:

#include <iostream>
#include <locale>
#include <boost/locale.hpp>

int main() {
    boost::locale::generator gen;
    std::locale loc = gen("en_US.UTF-8");
    std::string line;
    while (std::getline(std::cin, line))
        std::cout << boost::locale::to_lower(line, loc) << '\n';
}

To compile, we need to link to the Boost.Locale library:

g++ -lboost_locale lower.cpp -o lower

And when we execute it, we get the following:

$ ./main <<< 'ICH HÄTTE GERNE EINEN SÜßEN HASEN'
ich hätte gerne einen süßen hasen
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214