3

I need to reverse wstring. I have such code:

#include <iostream>
#include <string>
#include <locale>

int main() {
    std::wstring s;
    std::getline(std::wcin, s);
    for (const auto &i : s) {
        std::wcout << (int) i << " ";
    }
    std::wcout << std::endl;

    std::wcout << s << std::endl;

    std::reverse(s.begin(), s.end());
    std::wcout << s << std::endl;
    return 0;
}

ANSI characters are encoded in 1 byte, and I can easily reverse them:

echo -n "papa" | ./reverse
112 97 112 97
papa
apap

But when I enter cyrillic text, that are encoded more than 1 bytes, I get such output:

echo -n "папа" | ./reverse
208 191 208 176 208 191 208 176
папа
�пап�

How can I properly reverse that string?

P.S. I'm using OS X.

0x1337
  • 1,074
  • 1
  • 14
  • 33

1 Answers1

1

Your system, OS X, uses UTF-8. So there is no reason for you to use wstring or wchar_t. And indeed this is where the confusion comes from!

You see, when you call getline() with a wstring on OS X, it does not read wide characters at all. The characters are indeed four bytes each, but they hold the same 0-255 range of values that they would if you used a regular "narrow" string. So when you pipe your Cyrillic characters to your program, you end up with a wstring of length 8, because C++ doesn't understand UTF-8, but your terminal does (hence it looks like four characters in the terminal but 8 in C++).

A commenter on your question was right to point out this question: How do I reverse a UTF-8 string in place? - that is really all you need, once you realize that you aren't dealing with wide strings at all.

Community
  • 1
  • 1
John Zwinck
  • 239,568
  • 38
  • 324
  • 436