7

I am programming (just occassionally) in C++ with VisualStudio and MFC. I write a file with fopen and fprintf. The file should be encoded in UTF8. Is there any possibility to do this? Whatever I try, the file is either double byte unicode or ISO-8859-2 (latin2) encoded.

Glanebridge

Glanebridge
  • 81
  • 1
  • 1
  • 3
  • 1
    See other posts about Unicode in C++ http://stackoverflow.com/questions/55641/unicode-in-c – Dave Apr 05 '12 at 12:42
  • You can try to see this thread [enter link description here][1] [1]: http://stackoverflow.com/questions/2543346/how-to-write-unicode-hello-world-in-c-on-windows – Jepessen Apr 05 '12 at 12:52

3 Answers3

2

Yes, but you need Visual Studio 2005 or later. You can then call fopen with the parameters:

LPCTSTR strText = "абв";
FILE *f = fopen(pszFilePath, "w,ccs=UTF-8");
_ftprintf(f, _T("%s"),  (LPCTSTR) strText);

Keep in mind this is Microsoft extension, it probably won't work with gcc or other compilers.

sashoalm
  • 75,001
  • 122
  • 434
  • 781
  • I don't think this will affect data written to the file using fprintf. – bames53 Apr 05 '12 at 19:41
  • You need to use _ftprintf. See the changes in my answer. – sashoalm Apr 06 '12 at 08:07
  • Or simply use fwprintf. What's going on is that `ccs=UTF-8` sets the _O_U8TEXT mode on the file, so that writing wide characters to the file will cause UTF-8 to be output. Writing narrow characters with this mode set will result in an error. – bames53 Apr 06 '12 at 14:24
  • Do you mean that you already have a buffer with UTF-8 text? In that case why not just open the file in binary mode and write the buffer to it with fwrite? – sashoalm Apr 06 '12 at 14:47
  • No, I mean that since using tprintf will only work here if TCHAR and all the T functions resolve to wchar_t functions why not just use the wchar_t functions directly? TCHAR is only useful when a program is actually going to switch between char and wchar_t. If you don't want to use both then there's no reason to use TCHAR. `FILE* f = fopen(filename,"w,css=UTF-8"); fwprintf(f,L"%s",L"абв");` – bames53 Apr 06 '12 at 14:59
  • Oops, I had somehow decided you were the OP, sorry. As for fwprintf, you're right, it's better to use fwprintf. – sashoalm Apr 06 '12 at 15:16
  • I checked and that one really works! (Visual Studio 2013). Thanks! – Michael Haephrati Aug 12 '17 at 13:02
2

You shouldn't need to set your locale or set any special modes on the file if you just want to use fprintf. You simply have to use UTF-8 encoded strings.

#include <cstdio>
#include <codecvt>

int main() {
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
    std::string utf8_string = convert.to_bytes(L"кошка 日本国");

    if(FILE *f = fopen("tmp","w"))
        fprintf(f,"%s\n",utf8_string.c_str());
}

Save the program as UTF-8 with signature or UTF-16 (i.e. don't use UTF-8 without signature, otherwise VS won't produce the right string literal). The file written by the program will contain the UTF-8 version of that string. Or you can do:

int main() {
    if(FILE *f = fopen("tmp","w"))
        fprintf(f,"%s\n","кошка 日本国");
}

In this case you must save the file as UTF-8 without signature, because you want the compiler to think the source encoding is the same as the execution encoding... This is a bit of a hack that relies on the compiler's, IMO, broken behavior.

You can do basically the same thing with any of the other APIs for writing narrow characters to a file, but note that none of these methods work for writing UTF-8 to the Windows console. Because the C runtime and/or the console is a bit broken you can only write UTF-8 directly to the console by doing SetConsoleOutputCP(65001) and then using one of the puts variety of function.

If you want to use wide characters instead of narrow characters then locale based methods and setting modes on file descriptors could come into play.

#include <cstdio>
#include <fcntl.h>
#include <io.h>

int main() {
    if(FILE *f = fopen("tmp","w")) {
        _setmode(_fileno(f), _O_U8TEXT);
        fwprintf(f,L"%s\n",L"кошка 日本国");
    }
}

#include <fstream>
#include <codecvt>

int main() {
    if(auto f = std::wofstream("tmp")) {
        f.imbue(std::locale(std::locale(),
                new std::codecvt_utf8_utf16<wchar_t>)); // assumes wchar_t is UTF-16
        f << L"кошка 日本国\n";
    }
}
bames53
  • 86,085
  • 15
  • 179
  • 244
  • 1
    @NicolBolas The first example uses wstring_convert from C++11, but any other method of obtaining a UTF-8 encoding works too, e.g. WideCharToMultiByte. The last example uses a C++11 codecvt facet for which there's not a built-in, pre-c++11 replacement. The other two examples don't use C++11. – bames53 Apr 06 '12 at 14:19
1

In theory, you should simply set a locale which uses UTF-8 as external encoding. My understanding -- I'm not a Windows programmer -- is that Windows has no such locale, so you have to resort to implementation specific means or non standard libraries (link from Dave's comment).

Community
  • 1
  • 1
AProgrammer
  • 51,233
  • 8
  • 91
  • 143