0

I have a simple encoding program, and want to add some unicode characters. I'm doing some math, then converting the decimal value to hex, then to Unicode. The problem is that the conversion from decimal to hex, gives me a string, and that I don't know a way to convert a string to a char (not char[]).

how can I convert a Decimal Number to its Unicode character (char) equivalent with out creating a string?

Here's some code where I was trying to figure it out, we've established this wont work:

#include <iostream>
#include <sstream>
using namespace std;
int main(){
int decimal_value = 1111;
stringstream ss;
ss<< hex << decimal_value; // int decimal_value
string res ( ss.str() );
std::ostringstream oss;
oss << "\\u0" << res;
string var = oss.str();
cout << var << "\n" << "\u0457"  << "\n";

return 0;
}

output:

\u0457
ї
Cœur
  • 37,241
  • 25
  • 195
  • 267
j0h
  • 1,675
  • 6
  • 27
  • 50
  • 2
    For direct use of the C++ standard library this is OS-dependent. Unfortunately. The shoemaker's children are the only ones without shoes. – Cheers and hth. - Alf Jan 27 '16 at 17:47
  • 2
    In Unixland you can treat the `1111` as a `wchar_t` value which you can convert to UTF-8 and just output via `cout`. – Cheers and hth. - Alf Jan 27 '16 at 17:48
  • 1
    In Windows it's more complicated, and involves use of non-standard features or 3rd party libraries. My under-construction [cppx library](https://github.com/alf-p-steinbach/cppx) is one example. It works by installing new custom buffers in the standard wide streams, which buffers use UTF-16 based direct console i/o via Windows API functions. – Cheers and hth. - Alf Jan 27 '16 at 17:51
  • 1
    Here's your program using cppx: (http://codepad.org/67Af2kXR). It works OK in Windows. When I get to that point in the development it will also work nicely in Unix-land (maybe it does already, I don't know, sorry). – Cheers and hth. - Alf Jan 27 '16 at 17:59
  • @Cheersandhth.-Alf Any reason your example is recommending `wchar_t` rather than, say, `int` or `uint32_t`? I mean, it doesn’t really matter but `1111` isn’t a *character* itself (which the type might suggest) but rather a code point. – Konrad Rudolph Jan 27 '16 at 17:59
  • @KonradRudolph: `char` is also just a code point, especially in Unix-land (which has largely standardized on UTF-8, a variable length encoding). That's what `wchar_t` is used for. Alternatively one might use `char16_t` or `char32_t`. But I'm not sure of the conversion support for those in available compilers. – Cheers and hth. - Alf Jan 27 '16 at 18:02
  • @Cheersandhth.-Alf Well but `char` and `wchar_t` have overloaded iostream operators which suggests their dual use as characters (not code points). And in UNIX-land, combined with iostreams, `char` isn’t so much a code point but rather a byte in an UTF-8 byte stream. But yeah, it’s a rather moot argument since these are essentially just all integer types. More seriously, I think there’s no guarantee that `wchar_t` is big enough to hold every Unicode code point. – Konrad Rudolph Jan 27 '16 at 18:06
  • @KonradRudolph: With UTF-8 encoding a single `char` can only be treated as a full character, rather than an encoding value that's part of a character, if its value is in the range 0 through 127, which range is mapped to original ASCII. Sorry for not fixing the terminology in previous comment. I guess it's late in the day. ;-) – Cheers and hth. - Alf Jan 27 '16 at 18:07
  • @Cheersandhth.-Alf Ugh, I meant “byte in an UTF-8 byte stream”, not “character in an UTF-8 byte stream”. I’ve fixed my comment. – Konrad Rudolph Jan 27 '16 at 18:09
  • 1
    Regarding size, `wchar_t` is only 16 bits in Windows, but that's enough for all of the Basic Multilingual Plane of Unicode, which corresponds to original Unicode 1.0. And in Windows, that's all a console can handle, so the issue of non-BMP value is moot for portable console i/o. – Cheers and hth. - Alf Jan 27 '16 at 18:11
  • Possible duplicate of [How to print Unicode character in C++?](http://stackoverflow.com/questions/12015571/how-to-print-unicode-character-in-c) – Adrian McCarthy Jan 27 '16 at 19:02
  • @AdrianMcCarthy Its not a duplicate of that, as far as i can tell. Im generating the unicode from a string, and trying to output it. That thread is simply assigning. the character directly. I have no idea why its not the same. why doesnt a string stream of "\u0457" output a the same thing as char? ok ill try changing the datatype. – j0h Jan 27 '16 at 19:49
  • @j0h: There are several intertwined problems preventing this from working: data types, encoding, OS support. The other question has some decent answers that address most of these issues. – Adrian McCarthy Jan 27 '16 at 20:34
  • @Alf Regarding size, `wchar_t` is 32 bits in Windows if you use GCC. It's just 16 bits if you use Microsoft C/C++ compiler. – harper Jan 28 '16 at 06:04

0 Answers0