0

I am holding hex values in unsigned integers size_t and would like to convert them into wchar_t to hold in a data-structure and, optionally print to std::cout as it's UTF-8 symbol/character when valid.

I've tried casting without much success: size_t h = 0x262E; prints 9774 when doing a cast to wchar_t for example.

Some minimal code:

#include <iostream>
#include <vector>

int main() {
   std::setlocale( LC_ALL, "" );
   auto v = std::vector<size_t>( 3, 0x262E ); // 3x peace symbols
   v.at( 1 ) += 0x10; // now a moon symbol

   for( auto &el : v )
       std::cout << el << " ";

    return 0;
}

Output: 9774 9790 9774 What I want: ☮ ☾ ☮

I can print the symbols using printf( "%lc ", (wchar_t) el );. Is there a better "modern" C++ solution to this?

I need to be able to print anything in the range of 0000-27BF UTF-8 on linux only.

Azeem
  • 11,148
  • 4
  • 27
  • 40
  • related: https://stackoverflow.com/a/402918/2502409 – Nazar554 Nov 05 '18 at 12:22
  • 1
    Use `std::wcout` for printing instead of `std::cout`. 9774 is the decimal value of 0x262E btw. – Pezo Nov 05 '18 at 12:53
  • 1
    What are the hex values? Unicode code points are in the range `0` to `0x10FFFF`. UTF8 uses variable length encoding, each code point is represented by 1, 2, 3, or 4 bytes, the conversion is not trivial, casting won't work between different encodings. Show real example of your input. – Barmak Shemirani Nov 05 '18 at 16:01
  • I think you need to show your code because there are a lot of moving parts. I believe everything below 128 is its ASCII code in Unicode/UCS-16/UCS-32, etc. Just assign the ASCII code for the character. Also, UTF-8 uses 8-bit `char`'s, not `wchar_t`'s. Finally, `wchar_t` is usually printed as an `unsigned short` or `unsigned int` (depending on the platform), and not a character. – jww Nov 05 '18 at 16:56

1 Answers1

2

You need std::wcout with wchar_t cast to print the wide characters instead of std::cout.

Here's your corrected functional code (live example):

#include <iostream>
#include <vector>

int main() {
   std::setlocale( LC_ALL, "" );
   auto v = std::vector<size_t>( 3, 0x262E ); // 3x peace symbols
   v.at( 1 ) += 0x10; // now a moon symbol

   for( auto &el : v )
       std::wcout << (wchar_t) el << " "; // <--- Corrected statement

    return 0;
}

Output:

☮ ☾ ☮

In case you have hex string numbers, you can follow this solution.

Azeem
  • 11,148
  • 4
  • 27
  • 40
  • 1
    Thanks @Azeem, that works. For future readers there is a side note: ````std::cout```` and ````std::wcout```` don't mix and unexpected stuff will come out [link](https://stackoverflow.com/questions/8947949/mixing-cout-and-wcout-in-same-program). – Keyboard embossed forhead Nov 06 '18 at 12:49