2

I'm trying to create wide chars file using MinGW C on Windows, however wide chars seem to be omitted. My code:

const wchar_t* str = L"příšerně žluťoučký kůň úpěl ďábelské ódy";
FILE* fd = fopen("file.txt","w");
// FILE* fd = _wfopen(L"demo.txgs",L"w"); // attempt to open wide file doesn't help
fwide(fd,1); // attempt to force wide mode, doesn't help
fwprintf(fd,L"%ls",str);
// fputws(p,fd); // stops output after writing "p" (1B file size)
fclose(fd);

File contents

píern luouký k úpl ábelské ódy

The file size is 30B, so the wide chars are really missing. How to convince the compiler to write them?

As @chqrlie suggests in the comments: the result of

fwrite(str, 1, sizeof(L"příšerně žluťoučký kůň úpěl ďábelské ódy"), fd);

is 82 (I guess 2*30 + 2*10 (ommited chars) + 2 (wide trailing zero)).

It also might be useful to quote from here

The external representation of wide characters in files are multibyte characters: These are obtained as if wcrtomb was called to convert each wide character (using the stream's internal mbstate_t object).

Which explains why the ISO-8859-1 chars are single byte in the file, but I don't know how to use this information to solve my problem. Doing the opposite task (reading multibyte UTF-8 into wide chars) I failed to use mbtowc and ended up using winAPI's MultiByteToWideChar.

Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
  • You might need to use `_wfopen` – M.M Mar 10 '16 at 22:50
  • Also, embedding unicode in the source code may not work (my installation of mingw-w64 rejects your string literal at compile-time) – M.M Mar 10 '16 at 22:51
  • I just used `_wfopen`, the output is the same. The compile-time error is interesting. – Jan Turoň Mar 10 '16 at 22:54
  • Can your verify that `WCHAR` expands to `wchar_t`, or change the definition of `str` to `const wchar_t *s = ...`. – chqrlie Mar 10 '16 at 23:04
  • It looks like an encoding problem, none of the non ISO-8859-1 characters come out. Is your source file encoded in utf-8? Does the compiler recognize this encoding by default? Does it need a BOM? Can you print the return value of `fwide(fp, 1);`? Can you try `fwrite(str, 1, sizeof(L"příšerně žluťoučký kůň úpěl ďábelské ódy"), fp);`? – chqrlie Mar 10 '16 at 23:09
  • Yes, WCHAR expanded to wchar_t (I forgot to narrow my code, sorry).Yes, the source is UTF-8 and the compiler has no issue with that, [should be default](http://stackoverflow.com/a/12217048/343721). The source has BOM. The result of `fwide` is 1 and the result of `fwrite` is 82. – Jan Turoň Mar 10 '16 at 23:23
  • You might have a problem with the locale, for some reason `fwprintf` filters characters that are not part of the ISO-8859-1 character set. Look at the manual page for `setlocale`. – chqrlie Mar 11 '16 at 10:24
  • @JanTuroň: why did you delete your answer? – chqrlie Mar 13 '16 at 23:45
  • @chqrlie in my deleted answer I pointed on some issues that turned to be related to my text editor, not to the conversion itself. I need to find some more reliable editor and do some more study. I will let you know when I'm done and possibly undelete my answer. It may take a couple of days as my TODO list is far from empty. You can see deleted answers, so maybe you see what is wrong already? – Jan Turoň Mar 14 '16 at 01:28
  • @JanTuroň: I can see your various investigations, but I cannot do any testing on Windows, I stopped wasting energy on this broken^H^H^H^H^H^H unfriendly platform many years ago. There are so many quirks bugging C programmers to the point that the language they end up using is not really standard anymore. Regarding a decent editor, I have been using my own version of emacs, that is ridiculously anal about actual file contents. I hear sublime text does a good job, but I cannot be sure. – chqrlie Mar 14 '16 at 01:49

2 Answers2

2

I am not a Windows user, but you might try this:

const wchar_t *str = L"příšerně žluťoučký kůň úpěl ďábelské ódy";
FILE *fd = fopen("file.txt", "w,ccs=UTF-8");
fwprintf(fd, L"%ls", str);
fclose(fd);

I got this idea from this question: How do I write a UTF-8 encoded string to a file in windows, in C++

Community
  • 1
  • 1
chqrlie
  • 131,814
  • 10
  • 121
  • 189
1

I figured this out. The internal use of wcrtomb (mentioned in details of my question) needs setlocale call, but that call fails with UTF-8 on Windows. So I used winAPI here:

char output[100]; // not wchar_t, write byte-by-byte
int len = WideCharToMultiByte(CP_UTF8,0,str,-1,NULL,0,NULL,NULL);
if(len>100) len = 100;
WideCharToMultiByte(CP_UTF8,0,str,-1,output,len,NULL,NULL);
fputs(output,fd);

And voila! The file is 56B long with expected UTF-8 contents:

příšerně žluťoučký kůň úpěl ďábelské ódy

I hope this will save some nerves to Windows coders.

Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
  • Wide character support seems hopelessly broken because of `setlocale()` lack of standard arguments. Too bad it is the default in Windows. Look at the answers to this question for hints: http://stackoverflow.com/questions/3973582/how-do-i-write-a-utf-8-encoded-string-to-a-file-in-windows-in-c – chqrlie Mar 13 '16 at 01:51