2

There's plenty of information out there on how to output unicode Characters in C++. I'm able to do this with the below code. This outputs a lower case 'h'.

cout << '\u0068' << endl;

is it possible to go the other way. That is, input a 'h' and have cout display the 0068 (unicode number) in a console?

I have googled this to death and found nothing. I don't want to build a lookup table/switch statement etc. I though there may be simple way of converting unicode characters into their unicode numbers using something as simple as above. Any clues.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
domonica
  • 526
  • 7
  • 14

1 Answers1

3

The key is that you have to make the type of the datum such that the compiler picks the overloaded operator<<() for cout that results in numeric output. You can set the std::hex iomanip flag all you want, the type of a char or wchar_t will always pick the operator<<() that outputs a readable character, not the value of it. To get the compiler to pick the operator<<() that will print a value, you have to cast the character as a numeric type. In my example below, I cast it to a uint32_t because 32 bits is sufficient to represent any unicode character. I could cast it to an int on my system since on my system, ints are 32 bits, but to ensure that your code is portable even to tiny embedded systems where int is 16 bits and you hardly have enough memory to process ASCII let alone unicode, you should cast it to a type that's guaranteed to be 32 bits or more. long would also be sufficient.

#include <iostream>
#include <iomanip>
#include <stdint.h>

int main()
{
    cout << hex << '\u0068' << dec << endl;
    wchar_t c;
    std::wcout << "Type a character... " << std::endl;
    std::wcin >> c;
    std::wcout << "Your character was '" << c << "' (unicode "
               << std::hex << std::setw(4) << static_cast<uint32_t>(c) << ")\n";
}
phonetagger
  • 7,701
  • 3
  • 31
  • 55
  • Thanks for explaining it to me. The code works too. I never would have figured it out without this. Have a great day. Thanks again. – domonica Sep 08 '13 at 05:23
  • works good with latin chars, but exits on cyrillic input, any ideas why? – Drey Jul 24 '17 at 01:09
  • @Drey - Where does it "exit"? And in what manner does it exit? On the `std::wcin >> c;` line or in the `static_cast(c)`? If you're not sure, break the chained-together last line into separate lines to `std::wcout`, and add `<< std::endl` to the end of each to see better where it's actually exiting. Also possibly step through it with a debugger. (And did you build in 'debug' mode? You might get better debugging info.) I don't know anything about cyrillic characters or how they are stored; do they have unicode representations? – phonetagger Jul 24 '17 at 13:44
  • @phonetagger by exits I meant I added loop to program to wait for symbol, print its unicode number and wait for char again endlessly. Worked good with latin symbols but exiterd on cyrilic. >> I don't know anything about cyrillic characters or how they are stored; do they have unicode representations? Yes they do (https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode) Your original program outpputs `Your character was '' (unicode 0)` on every cyrillic input. – Drey Jul 26 '17 at 02:01
  • @Drey - How can I duplicate this behavior? How do I input Cyrillic characters on a U.S. English keyboard? What sort of loop construct did you use? Just `while (true) { ... }`? And to clarify, initially you said the program "exits". To me that means the program terminates abnormally; i.e. it wouldn't have printed "Your character was...", or perhaps printed that but crashed immediately afterward. Now you seem to suggest that the only strange behavior is that it prints nothing for the character, and prints `0` for the number. Does it then somehow unexpectedly exit that `while(true) { ... }` loop? – phonetagger Jul 26 '17 at 15:12