In fact the conversion is a little more complicated.
string s2 = "\u94b1";
is in fact the equivalent of:
char cs2 = { 0xe9, 0x92, 0xb1, 0}; string s2 = cs2;
That means that you are initializing it the the 3 characters that compose the UTF8 representation of 钱 - you char just examine s2.c_str()
to make sure of that.
So to process the 6 raw characters '\', 'u', '9', '4', 'b', '1', you must first extract the wchar_t from string s1 = "\\u94b1";
(what you get when you read it). It is easy, just skip the two first characters and read it as hexadecimal:
unsigned int ui;
std::istringstream is(s1.c_str() + 2);
is >> hex >> ui;
ui
is now 0x94b1
.
Now provided you have a C++11 compliant system, you can convert it with std::convert_utf8
:
wchar_t wc = ui;
std::codecvt_utf8<wchar_t> conv;
const wchar_t *wnext;
char *next;
char cbuf[4] = {0}; // initialize the buffer to 0 to have a terminating null
std::mbstate_t state;
conv.out(state, &wc, &wc + 1, wnext, cbuf, cbuf+4, next);
cbuf
contains now the 3 characters representing 钱 in utf8 and a terminating null, and you finaly can do:
string s3 = cbuf;
cout << s3 << endl;