0

My intend is to write strings such as ñaäïüwiç (utf-8 encoded) using WriteFile method. So I have the following code:

#include <windows.h>
#include <fcntl.h>
#include <io.h>
#include <stdio.h>

int main(void) {
    WCHAR str[] = L"ñaäïüwiç \n";
    DWORD dwRead, dwWritten;
    dwRead = (wcslen(str) + 1) * sizeof(WCHAR);
    HANDLE hParentStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    BOOL bSuccess = WriteFile(hParentStdOut, str, dwRead, &dwWritten, NULL);
    return 0;
}

What this small program does is to print the following instead:

± a õ ´ ³ w i þ

How do I solve this problem?

tshepang
  • 12,111
  • 21
  • 91
  • 136
richardtk_1
  • 751
  • 3
  • 11
  • 33
  • Your use of `WCHAR` and `L"ñaäïüwiç \n"`seem to indicate you are using UTF-16, so you are probably writing UTF-16 to an UTF-8 (or ASCII) output stream. I guess that would explain the funny characters and the gaps inbetween. I also don't know if your editor can handle the characters given. – Rudy Velthuis Jul 13 '14 at 23:34

1 Answers1

2

It looks like your bytes are being interpreted as ASCII instead. The character ñ in UTF-16 has a hex encoding of 0x00F1. 0xF1 corresponds to ± in ASCII codepage 437. Same is true of the other characters that are printed. It looks like the bytes, as defined by your use of UTF-16 literal, are not lost, but are rather interpreted as single ASCII bytes 0xF1 0x00 etc. by the stream.

See related post here: How to Output Unicode Strings on the Windows Console

That post says that you should use WriteConsoleW instead. The arguments for that function are the same as for WriteFile, except that str is expected to be UTF-16:

    DWORD dwToWrite, dwWritten;
    dwToWrite = wcslen(str);
    HANDLE hParentStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    BOOL bSuccess = WriteConsoleW(hParentStdOut, str, dwToWrite, &dwWritten, NULL);
Community
  • 1
  • 1
user3814483
  • 292
  • 1
  • 2
  • 13
  • 1
    FWIW, `ñ` is U+00F1 in Unicode (and thus in UTF-16), and 0xF1 is `±` in code page 437. So it could just as well be that 0x00F1 is interpreted as bytes 0xF1 0x00 in codepage 437 ASCII, hence the "gap". – Rudy Velthuis Jul 14 '14 at 00:59
  • You are likely correct; I missed the gaps in his output. – user3814483 Jul 14 '14 at 01:05