9

I've been trying to call std::tolower() with a different locale but it seems that something is going wrong. My code is as follows:

int main() {
    std::locale::global(std::locale("es_ES.UTF-8"));
    std::thread(&function, this); // Repeated some times
    // wait for threads
}

void function() {
    std::string word = "HeÉllO";
    std::transform(word.begin(), word.end(), word.begin(), cToLower);
}

int cToLower(int c) {
    return std::tolower(c, std::locale());
}

So when I try to execute this program I get:

terminate called after throwing an instance of 'std::bad_cast'
terminate called recursively
  what():  std::bad_cast
Aborted (core dumped)

Although executing return std::tolower(c); works fine, but it just converts the 'standard' characters to lower, and not É.

I have some threads which are executing the same function simultaneously, using C++11 and compiling with g++ (in case it has something to do with it).

I was wondering if this is the correct way to implement what I want to do, or there's some other way of doing it.

Thanks!

lpares12
  • 3,504
  • 4
  • 24
  • 45
  • Try to find out if the problem is with threading or with the locale. 1.) Does it work in a single-threaded program? 2.) Does it work with an English string and the default locale in a multi-threaded program? – Christian Hackl Feb 26 '17 at 15:09
  • 2
    Try `std::tolower((unsigned char)c, std::locale())`. From [the documentation](http://en.cppreference.com/w/cpp/string/byte/tolower): "If the value of `ch` is not representable as unsigned char and does not equal `EOF`, the behavior is undefined." The problem is, `char` is usually signed, and characters like `É` are represented as negative values, which then fall outside the range of `unsigned char` – Igor Tandetnik Feb 26 '17 at 15:13
  • @IgorTandetnik, Correct!. please make that an answer. :-) – WhiZTiM Feb 26 '17 at 15:15
  • @IgorTandetnik still a bad_cast is thrown. I understand the logic you present but looks like I will have to use wstring instead. – lpares12 Feb 26 '17 at 18:27
  • 1
    @IgorTandetnik That applies only to the single-parameter version from the C standard library, not the two-parameter one from C++ ``. – T.C. Feb 27 '17 at 14:03

2 Answers2

3

Unlike the version of tolower that came from C (which takes characters converted to unsigned char and then to int), the <locale> version of tolower is meant to be called with characters directly. It is defined to use the std::ctype<charT> facet of the locale, and the only two std::ctype specializations guaranteed to be available are std::ctype<char> and std::ctype<wchar_t>. Thus:

char cToLower(char c) {
    return std::tolower(c, std::locale());
}

Note that this is still a char-by-char transform; if the character occupies more than one byte, it is unlikely to handle it properly.

T.C.
  • 133,968
  • 17
  • 288
  • 421
  • Thanks for the reply! I see what you mean, but although this compiles, it doesn't transform the special characters like `É` to `é`. – lpares12 Feb 27 '17 at 17:33
  • @lpares12 Well, I can't test it, but is `É` actually a single `char`? If it actually takes more than one byte, then a byte-by-byte transformation (which is what you are doing) won't cut it. – T.C. Feb 27 '17 at 17:45
  • `É` and `é` are represented in decimal as `144` and `130` respectively, so both should fit inside a byte. But now that I think of it, it should be represented as `unsigned char`, otherwise they both would have negative values. – lpares12 Feb 27 '17 at 18:32
  • @lpares12 Well, [U+00C9](https://codepoints.net/U+00C9) ("LATIN CAPITAL LETTER E WITH ACUTE") is encoded as two bytes in UTF-8, likewise for U+00E9 ("LATIN SMALL LETTER E WITH ACUTE"). I'm not sure what encoding you are using. – T.C. Feb 27 '17 at 18:57
1

Check if locale you are trying to use installed on your system. For example I have to install Spanish locale before code below stop crashing. Additionally you could work with wstring instead. Update: after some digging here is good explanation of using wstring - all cons and procs (cons mostly).

#include <thread>
#include <locale>
#include <algorithm> 
#include <iostream>

//forward declaration
void function();

int main() {
    std::locale::global(std::locale("es_ES.utf8"));
    std::thread test(&function);
    test.join();
}

wchar_t cToLower(wchar_t c) {        
    return std::tolower(c, std::locale());    
}

void function() {
    std::wstring word = L"HeÉllO";
    std::transform(word.begin(), word.end(), word.begin(), cToLower);
    std::wcout << word;
}

Output:

heéllo
Community
  • 1
  • 1
j2ko
  • 2,479
  • 1
  • 16
  • 29