2
int main(){
//"Chào" in Vietnamese
wchar_t utf16[] =L"\x00ff\x00fe\x0043\x0000\x0068\x0000\x00EO\x0000\x006F";
//Dump utf16: FF FE 43 0 68 0 E 4F 0 6F (right)
int size = WideCharToMultiByte(CP_UTF8,0,utf16,-1,NULL,0,NULL,NULL);
char *utf8 = new char[size];
int k = WideCharToMultiByte(CP_UTF8,0,utf16,-1,utf8 ,size,NULL,NULL);
//Dump utf8: ffffffc3 fffffbf ffffc3 ffffbe 43 0
}

Here is my code, when i convert it string into UTF-8, it show a wrong result, so what is wrong with my code?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
user2477
  • 896
  • 2
  • 10
  • 23

2 Answers2

0
wchar_t utf16[] = L"\uFEFFChào";
int size = 5;

for (int i = 0; i < size; ++i) {
    std::printf("%X ", utf16[i]);
}

This program prints out: FEFF 43 68 E0 6F

If printing out each wchar_t you've read from a file prints out FF FE 43 0 68 0 E 4F 0 6F then the UTF-16 data is not being read from the file correctly.. Those values represent the UTF-16 string: `L"ÿþC\0h\0à\0o".

You don't show your code for reading from the file, but here's one way to do it correctly:

https://stackoverflow.com/a/10504278/365496

Community
  • 1
  • 1
bames53
  • 86,085
  • 15
  • 179
  • 244
0

You're reading the file incorrectly. Your dump of the input is showing single bytes in wide characters. Your dump of the output is the byte sequence that results from encoding L"\xff\xfe\x43" to UTF-8. The string is being truncated at the first \x0000 in the input.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622