Converting unicode strings and vice-versa

Question

I'm kind of new to using Unicode string and pointers and I've no idea how the conversion to unicode to ascii and versa-versa works. Following is what I'm trying to do,

const wchar_t *p = L"This is a string";

If I wanted to convert it to char*, how would the conversion work with converting wchar_t* to char* and vice-versa?

or by value using wstring to string class object and vice-versa

std::wstring wstr = L"This is a string";

If i'm correct, can you just copy the string to a new buffer without conversion?

score 23 · Answer 1 · edited May 01 '19 at 18:49

23

In the future (VS 2010 already supports it), this will be possible in standard C++ (finally!):

#include <string>
#include <locale>

std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
const std::wstring wide_string = L"This is a string";
const std::string utf8_string = converter.to_bytes(wide_string);

edited May 01 '19 at 18:49

Drew Dormann

59,987
13
123
180

answered Jan 24 '11 at 20:29

Philipp

48,066
12
84
109

4

I think there is a typo `std::wstring` in the last line should be `std:string` – Tyler Liu Mar 24 '13 at 06:22
1

That the last line should be `std::string`: confirmed from http://en.cppreference.com/w/cpp/locale/wstring_convert/to_bytes – Dan Nissenbaum Apr 29 '14 at 16:25

score 5 · Answer 2 · answered Jan 25 '11 at 09:51

5

The conversion from ASCII to Unicode and vice versa are quite trivial. By design, the first 128 Unicode values are the same as ASCII (in fact, the first 256 are equal to ISO-8859-1).

So the following code works on systems where char is ASCII and wchar_t is Unicode:

const char* ASCII = "Hello, world";
std::wstring Unicode(ASCII, ASCII+strlen(ASCII));

You can't reverse it this simple: 汉 does exist in Unicode but not in ASCII, so how would you "convert" it?

answered Jan 25 '11 at 09:51

MSalters

173,980
10
155
350

There is also from_bytes which you can use like --- std::wstring_convert> converter; const std::wstring wstring = converter.from_bytes(string); – TinyRacoon Oct 22 '20 at 16:14

score 3 · Answer 3 · answered Jan 24 '11 at 19:42

3

C++ by itself doesn't offer this functionality. You'll need a separate library, like libiconv.

answered Jan 24 '11 at 19:42

Thomas

174,939
50
355
478

score 3 · Accepted Answer · answered Jan 24 '11 at 19:52

3

The solutions are platform-dependent. On Windows use MultiByteToWideChar and WideCharToMultiByte API functions. On Unix/linux platforms iconv library is quite popular.

answered Jan 24 '11 at 19:52

Eugene Mayevski 'Callback

45,135
8
71
121

Beware that `MultiByteToWideChar` has a bug when converting codepage 50225 (Korean - ISO-2022-KR) which converts characters incorrectly as noted on https://support.microsoft.com/en-us/kb/960293 - The suggested workaround is to use `IMultiLanguage::ConvertStringToUnicode` instead which converts the same characters properly - please update answer to make this more visible. – Coder12345 Sep 02 '15 at 12:52

score 3 · Answer 5 · answered Jan 24 '11 at 20:39

3

C Standard library functions: mbstowcs and wcstombs

answered Jan 24 '11 at 20:39

cpx

17,009
20
87
142

score 0 · Answer 6 · edited Nov 13 '15 at 09:47

0

The widen() algorithm converts char to wchar_t :

char a;
a = 'a';
whcar_t wa = cin.widen(a);

Of course, you have to put it into a loop. And resolve the *; The opposite is accomplished by narrow()

edited Nov 13 '15 at 09:47

simon

1,180
3
12
33

answered Jan 24 '11 at 19:42

bratao

1,980
3
21
38

Converting unicode strings and vice-versa

6 Answers6

Linked