6

I am implementing serialization using Boost C++ libraries in a program that is built for Windows (using Visual Studio 2008) and Mac (using GCC). The program uses wide strings (std::wstring) in about 30 of its classes. Depending on the platform, when I save to a file (by means of boost::archive::text_woarchive), the wide strings are represented differently within the output file.

Saved under Windows:

H*e*l*l*o* *W*o*r*l*d*!* ...

Saved under MacOSX:

H***e***l***l***o*** ***W***o***r***l***d***!*** ...

where * is a NULL character.

When I try to read a file created under Windows using the Mac build (and vice versa), my program crashes.

From my understanding so far, Windows natively uses 2 bytes per wide character while MacOSX (and I suppose Unix in general) uses 4 bytes.

I have come across possible solutions such as utf8_codecvt_facet.cpp, UTF8-CPP, ICU, and Dinkumware, but I have yet to see an example that will work with what I already have (e.g., I would prefer not re-writing five months of serialization work at this point):

std::wofstream ofs( "myOutputFile" );
boost::archive::text_woarchive oa( ... );
//... what do I put here? ...
oa << myMainClass;

myMainClass contains wide strings and Boost smart pointers to other classes that, in turn, get serialized.

ST3
  • 8,826
  • 3
  • 68
  • 92
Tymek
  • 664
  • 7
  • 8
  • Is there a way to add your own specialization of the load and save functions for wstring? – bames53 Dec 12 '11 at 22:22
  • What do you mean? Splitting serialization into separate `load` and `save` functions? I do know how to do that, but I am not exactly certain what type of conversions to perform on wstrings if I were to write those functions. – Tymek Dec 13 '11 at 02:17
  • I would go with bames53 idea, write a specialization of the boost::serialization routine for wstring. That way you can choose either 2 or 4 bytes per character and stick with it for both platforms. – fileoffset Dec 13 '11 at 04:07
  • @Tymek No, my mention of load and save functions was only incidental. I just meant that you might override the default serialization function with your own code which, for example, converts the wstring to a UTF-8 string for serialization. For example if wstring serialization is implemented via a template, you could create your own template specialization for wstring. – bames53 Dec 13 '11 at 04:13
  • You have to decide on an exchange format (UTF-8, UTF-16, UTF-16BE, UTF16-LE, UTF32...). – curiousguy Dec 13 '11 at 16:18
  • @curiousguy It's UTF-8. I am still trying to figure out how to store data in UTF-8 format. Any examples that work with Boost serialization? – Tymek Dec 13 '11 at 17:08

2 Answers2

2

wofstream is typedef basic_ofstream<wchar_t, char_traits<wchar_t> > wofstream;

on linux, you need to declare a custom ofstream to deal with 16-bit characters (on linux). This can be done as follows:

typedef std::uint16_t Char16_t;
typedef basic_ofstream<Char16_t, char_traits<Char16_t> > wofstream_16;

Now wofstream_16 can be used seamlessly on different platforms to deal with 16-bit wide chars.

vine'th
  • 4,890
  • 2
  • 27
  • 27
  • Thanks vine'th. I tried to add this but I am getting compilation issues: `error: no matching function for call to boost::archive::text_woarchive::text_woarchive(SaveSession()::wofstream_16&)` ... `candidates are: boost::archive::text_woarchive::text_woarchive(std::wostream&, unsigned int)` ... `boost::archive::text_woarchive::text_woarchive(const boost::archive::text_woarchive&)` Any ideas? – Tymek Dec 13 '11 at 17:09
  • I think you need to add an overloaded << operator to `myMainClass`, which accepts `wofstream_16&` (Guessing, tough to pin point the problem without the src.) HTH. – vine'th Dec 15 '11 at 07:43
0

There is a simple solution to this that works for me. It was just a matter of understanding these statements in the official documentation and turning them into C++ syntax:

  1. Open a wide character stream.
  2. Alter the stream locale to use boost::archive::codecvt_null
  3. Create the archive with the flag no_codecvt.

So everything together looks like this (output to file):

#include <fstream>
#include <locale>

#include <boost/archive/codecvt_null.hpp>
#include <boost/archive/text_woarchive.hpp>
#include <boost/archive/text_wiarchive.hpp>

// (1)
std::wofstream ofs( "myOutputFile.dat" );

// (2)
std::locale loc( ofs.getloc(), new boost::archive::codecvt_null<std::ostream::char_type>() );
ofs.imbue( loc );

// (3) (note text_woarchive)
boost::archive::text_woarchive oa( ofs, boost::archive::no_codecvt );

oa << myMainClass;

The same idea would apply for file input:

std::wifstream ifs( "myInputFile.dat" );

std::locale loc( ifs.getloc(), new boost::archive::codecvt_null<std::ostream::char_type>() );
ifs.imbue( loc );

boost::archive::text_wiarchive ia( ifs, boost::archive::no_codecvt );

ar >> myMainClass;

The output files on both platforms are now identical and stored as UTF8.

Tymek
  • 664
  • 7
  • 8