0

Based on my previous question C++: Convert hex representation of UTF16 char into decimal (like python's int(hex_data, 16))

I would like to know how to convert a string into unicode for char16_t:

As

int main()
{   
    char16_t c = u'\u0b7f';
    std::cout << (int)c << std::endl;
    return 0;
}

yields decimal 2943 perfectly fine, I now need to know how to inject a 4-digit string into char16_t c = u'\uINSERTHERE'

My stringstream_s contains 4 hex representation letters like '0b82' (decimal: 2946) or '0b77' (decimal: 2935).

I tried

std::string stringstream_s;
....stringstream_s gets assigned....
char16_t c = (u'\u%s',stringstream_s);

and it gives me "no suitable conversion function from std::string to char16_t exists"

So basically speaking, how to convert a string into unicode utf16.....?

I need to know the equivalent of u'\u0b7f' when I just have a bare string of '0b7f'

Community
  • 1
  • 1
  • 1
    Your code absolutely doesn't make sense. And previously, you converted a int(-like) to ... an int, now a string to an int(-like)? – deviantfan Nov 11 '16 at 12:47
  • Well I know, so tell me how it makes sense. I'm not into c++ .....I need to know the equivalent of u'\u0b7f' when I just have a bare string of '0b7f' – Dr. John James Cobra Nov 11 '16 at 12:49
  • `std:.stoi` or something like that. Btw., are you aware that UTF16 has 4byte codepoints too? – deviantfan Nov 11 '16 at 12:51
  • 1
    what do you mean by "4byte codepoints"? I thought utf16 is a 2byte encoding – Dr. John James Cobra Nov 11 '16 at 12:52
  • Why are you asking about conversions when the actual question is "How can I insert one string into another"? Just use `u16string::insert`. To convert between Unicode encodings use the `codecvt` header. – Panagiotis Kanavos Nov 11 '16 at 12:59
  • [This UTF-16 Wikipedia article](https://en.wikipedia.org/wiki/UTF-16) could be useful. – Some programmer dude Nov 11 '16 at 13:00
  • 1
    @Someprogrammerdude apart from the general (mis) understanding of Unicode, the question should be about C++/STL functions for string manipulations and conversions. `u16string`s are still `basic_string`s – Panagiotis Kanavos Nov 11 '16 at 13:03
  • @Dr.JohnJamesCobra I mean that UTF16 has some 2byte and some 4byte codepoints. Supporting only 2byte is not UTF16. – deviantfan Nov 11 '16 at 13:08
  • For the OP: When you do `char16_t c = u'\u0b7f';` it is exactly the same as doing `char16_t c = 0x0b7f;` (or `char16_t c = 2943;` for that matter). Now think about how you can get an integer out from your input string stream, then it's just a simple assignment. – Some programmer dude Nov 11 '16 at 13:08
  • 1
    @Dr.JohnJamesCobra Even if you think it's insulting, I too recommend reading it. You have grave misunderstandings about Unicode. Demanding we explain everything to you, if there are already good learning resources, won't get you anywhere. – deviantfan Nov 11 '16 at 13:12
  • @Dr.JohnJamesCobra the question *is* extremely unclear. It gives the impression that you think Unicode requires some special treatment through byte processing. That's not true. What is the codepage of the input string? If it's plain ASCII, it's exactly the same as a UTF8 string, in fact, UTF8 strings are stored as char arrays or `std::string` (not smart in my opinion). Why don't you try to convert the input with `codecvt` ? – Panagiotis Kanavos Nov 11 '16 at 13:14
  • The page [String and Character Literals in C++](https://msdn.microsoft.com/en-us/library/69ze775t.aspx) explains how UTF8,16,32 are supported in C++11 and C++14, which C++ and STL library types should be used when – Panagiotis Kanavos Nov 11 '16 at 13:19
  • (Addition to my previous comment, some keywords to google/think after the wirst wiki page: list of unicode characters, UTF16, UTF8, BOM, multi-codepoint characters, normalization) – deviantfan Nov 11 '16 at 13:19
  • @Panagiotis I tried codecvt with like std::wstring_convert, char16_t> converted_char; but this gives me weird linker issues. I don't know if some ".lib" is missing as the debug message is unclear (..and btw these linker issues are annoying to someone who is using an interpreted language most of time...) – Dr. John James Cobra Nov 11 '16 at 13:21
  • @Panagiotis **Do you know how to convert from std::string to char16_t?** When I resolve this issue I have the solution (together with Programmer Dude's proposal to use 0x0b7f equivalently to u'\u0b7f' – Dr. John James Cobra Nov 11 '16 at 13:24
  • 2
    You want `std::codecvt_utf8_utf16`. If you have "weird linking issues", resolve them by e.g. asking a question here. – n. m. could be an AI Nov 11 '16 at 13:27

1 Answers1

0

You need to convert the std::string to an integer first, then you can type-cast the integer to char16_t. In this case, std::stoi() would be the simplest solution:

std::string s = "0b82";
char16_t ch = (char16_t) std::stoi(s, nullptr, 16); 

Alternatively, you can use a std::istringstream instead:

std::string s = "0b82";
unsigned short i;
std::istringstream(s) >> std::hex >> i;
char16_t ch = (char16_t) i; 
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770