1

I've got wide string and I'm writing it to a wofstream that I opened in out|binary mode. When I look in the resultant file, it's missing every other byte.

I was expecting that when I opened the file in visual studio with the binary editor that I'd see every other byte as a zero, but I'm not seeing the zeros.

Do you know what I'm missing?

Thanks.


The code is something like this:

CAtlStringW data = L"some data";
wofstream stream("c:\hello.txt", ios_base:out|ios_base:binary);
stream.write( data.GetBuffer(), data.GetLength() );
stream.close();
ST3
  • 8,826
  • 3
  • 68
  • 92
Scott Langham
  • 58,735
  • 39
  • 131
  • 204
  • Can you show us some code please – Cthutu Mar 04 '10 at 15:15
  • 2
    Duplicate of http://stackoverflow.com/questions/1509277/why-does-wide-file-stream-in-c-narrow-written-data-by-default – Martin York Mar 04 '10 at 15:54
  • 1
    Potential solution here: http://stackoverflow.com/questions/207662/writing-utf16-to-file-in-binary-mode/208431#208431 – Martin York Mar 04 '10 at 15:55
  • 2
    Short answer: Because the IOstream library is broken. ;) Wide streams simply take a string of wide characters and convert them to regular char, then write those to the stream. So `L"Hello world"` gets written out as `"Hello world"`. Ridiculous, but true. – jalf Mar 04 '10 at 16:10
  • @jaff: No, it isnt. It's designed that way because not all system support unicode on disk, and therefore the standard assumes on-disk formats are in ANSI even if things are unicode internally. – Billy ONeal Mar 04 '10 at 23:05
  • @BillyONeal Why does it assume that in the first place? shouldn't that be the special case? – Khaled Alshaya Mar 09 '10 at 20:38
  • @AraK: Because that's normal behavior -- the vast majority of systems do not recognize wide characters. Furthermore, all interactions on wide characters are implementation defined, and the C++ standard would like files produced on one platform to be usable on other platforms. – Billy ONeal Mar 09 '10 at 23:27

2 Answers2

1

see the "Community Content" at the bottom of the page http://msdn.microsoft.com/en-us/library/f1d6b0fk(VS.80).aspx. In short, you have to use pubsetbuf() to use a wchar_t based internal buffer for your stream (instead of a char based).

0

When you write to file using output wide stream, what actually happens is that it converts the wide characters to other 8-bit encoding.

If you were using UTF-8 locale it would convert wide strings to UTF-8 encoded text (but MSVC does not provides UTF-8 locales) so generally it would try to convert to some code-page like cp1251 or to ASCII.

Artyom
  • 31,019
  • 21
  • 127
  • 215