5

I have a string that may or may not have unicode characters in it, I am trying to write that to a file on windows. Below I have posted a sample bit of code, my problem is that when I fopen and read the values back out windows, they are all being interpreted as UTF-16 characters.

char* x = "Fool";
FILE* outFile = fopen( "Serialize.pef", "w+,ccs=UTF-8");
fwrite(x,strlen(x),1,outFile);
fclose(outFile);

char buffer[12];
buffer[11]=NULL;
outFile = fopen( "Serialize.pef", "r,ccs=UTF-8");
fread(buffer,1,12,outFile);
fclose(outFile);

The characters are also interpreted as UTF-16 if I open the file in wordpad etc. What am I doing wrong?

NSA
  • 5,689
  • 8
  • 37
  • 48

2 Answers2

7

Yes, when you specify that the text file should be encoded in UTF-8, the CRT implicitly assumes that you'll be writing Unicode text to the file. Not doing so doesn't make sense, you wouldn't need UTF-8. This will work proper:

wchar_t* x = L"Fool";
FILE* outFile = fopen( "Serialize.txt", "w+,ccs=UTF-8");
fwrite(x, wcslen(x) * sizeof(wchar_t), 1, outFile);
fclose(outFile);

Or:

char* x = "Fool";
FILE* outFile = fopen( "Serialize.txt", "w+,ccs=UTF-8");
fwprintf(outFile, L"%hs", x);
fclose(outFile);
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 1
    Of course you'll be writing Unicode text to the file, but the point is that the CRT assumes you'll be writing **UTF-16**. – dan04 Oct 20 '10 at 03:20
  • 1
    @dan - no, it assumes you'll be writing wchar_t. That it is utf-16 on Windows is an implementation detail. – Hans Passant Oct 20 '10 at 03:31
1

It is easy if you use the C++11 standard (because there are a lot of additional includes like "utf8" which solves this problems forever).

But if you want to use multi-platform code with older standards, you can use this method to write with streams:

  1. Read the article about UTF converter for streams
  2. Add stxutif.h to your project from sources above
  3. Open the file in ANSI mode and add the BOM to the start of a file, like this:

    std::ofstream fs;
    fs.open(filepath, std::ios::out|std::ios::binary);
    
    unsigned char smarker[3];
    smarker[0] = 0xEF;
    smarker[1] = 0xBB;
    smarker[2] = 0xBF;
    
    fs << smarker;
    fs.close();
    
  4. Then open the file as UTF and write your content there:

    std::wofstream fs;
    fs.open(filepath, std::ios::out|std::ios::app);
    
    std::locale utf8_locale(std::locale(), new utf8cvt<false>);
    fs.imbue(utf8_locale); 
    
    fs << .. // Write anything you want...
    
Dr1Ku
  • 2,875
  • 3
  • 47
  • 56
Yarkov Anton
  • 639
  • 6
  • 11
  • 1
    How do you do it with C++11? – aCuria Mar 22 '13 at 05:30
  • why do you need the bom? I read around that it is not required or even recommended in utf-8, since it has no meaning. Is writing the bom required in windows or can be avoided entirely? – Germán Diago Oct 03 '14 at 05:42