1

I tried a very simple code in C++:

#include <iostream>
#include <string>

int main()
{
  std::wstring test = L"asdfa-";
  test += u'ç';
  std::wcout << test;
}

But the result was:

asdfa-?

It was not possible print 'ç', with cout or wcout, how can I can print this string correctally?

OS: Linux.

Ps: I use wstring instead of string, because sometimes I need calculate the length of the string, and this size must be the same of what is on the screen.

Ps: I need concatenate the unicode char, it can't be on the string constructor.

Alex
  • 3,301
  • 4
  • 29
  • 43

4 Answers4

4

First, here's something that does work:

#include <iostream>
#include <string>

int main() {
    std::string test = "asdfa-";
    test += "ç";
    std::cout << test;
}

I used just regular strings here and let C++ keep everything in UTF-8. I think you already know that this would work because you mentioned that you wanted to concatenate the ç rather than just leaving it in the string constructor.

Dealing with char, char16_t, char32_t, and wchar_t in C++ has never really been fun. You have to be careful with the L, u, and U prefixes.

However, where possible, if you deal with utf-8 strings, and avoid characters, you can generally get things to work much better. And since most consoles (with the possible exception of old Windows machines) understand utf-8 pretty well, this is the approach that often just works the best. So if you have wide characters, see if you can convert them to regular std::string objects and work in that domain.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
  • The problem for me is that i need to calculate string length, and 'ç' make the length different from what print on the screen. – Alex Jan 12 '18 at 17:06
  • 1
    Why do you need to calculate the length of the string? That operation is usually a code smell. Most applications do not need it. Anyway, that's messy too..., Do you mean the number of bytes, the number of characters, or the number of graphemes? – Ray Toal Jan 12 '18 at 18:34
  • @Alex If you need to calculate the length of what is printed on the screen in character cells, you're out of luck. A Unicode character (== code point, a `wchar_t` in C and C++ on Linux) may take up 0, 1, or more cells, and a cell may contain 1 or more Unicode characters. There is no way to determine any of this using Standard C++. – n. m. could be an AI Jan 12 '18 at 23:01
1

One general way of handling this would be:

  1. Input (convert from multibyte to wide using current locale)

  2. Your App: work with wide strings

  3. Output or saving to a file (convert from wide to multibyte)

For wide string manipulations like num of characters, substring etc. there is wcsXXX class of functions.

StPiere
  • 4,113
  • 15
  • 24
1

If you are using libstdc++ on Linux: you forgot an essential call at the beginning of the program

std::locale::global(std::locale(""));

This is assuming you are on Linux and your locale supports UTF-8.

If you are using libc++: forget about using wstreams. This library does not support I/O of wide characters in a useful way (i.e. translation to UTF-8 like libstdc++ does).

Windows has a wholly separate set of quirks regarding Unicode. You are lucky if you don't have to deal with them.

demo with gcc/libstdc++ and a call to std::locale

demo with gcc/libstdc++ and no call to std::locale

Different versions of clang/libc++ behave differently with this example: some output ? instead of the non-ascii char, some output nothing; some crash on call to std::locale, some don't. None do the right thing, which is printing the ç, or maybe I just haven't found one that works. I don't recommend using libc++ if you need anything related to locale or wchar_t.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
0

I solved this problem using a conversion function:

#include <iostream>
#include <string>
#include <codecvt>
#include <locale>

std::string wstr2str(const std::wstring& wstr) {
  std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
  return myconv.to_bytes(wstr);
}

int main()
{
  std::wstring test = L"asdfa-";
  test += L'ç';
  std::string str = wstr2str(test)
  std::cout << str;
}
Alex
  • 3,301
  • 4
  • 29
  • 43