Boost C++ cross-platform (Windows & Mac) serialization of std::wstring

Question

I am implementing serialization using Boost C++ libraries in a program that is built for Windows (using Visual Studio 2008) and Mac (using GCC). The program uses wide strings (std::wstring) in about 30 of its classes. Depending on the platform, when I save to a file (by means of boost::archive::text_woarchive), the wide strings are represented differently within the output file.

Saved under Windows:

H*e*l*l*o* *W*o*r*l*d*!* ...

Saved under MacOSX:

H***e***l***l***o*** ***W***o***r***l***d***!*** ...

where * is a NULL character.

When I try to read a file created under Windows using the Mac build (and vice versa), my program crashes.

From my understanding so far, Windows natively uses 2 bytes per wide character while MacOSX (and I suppose Unix in general) uses 4 bytes.

I have come across possible solutions such as utf8_codecvt_facet.cpp, UTF8-CPP, ICU, and Dinkumware, but I have yet to see an example that will work with what I already have (e.g., I would prefer not re-writing five months of serialization work at this point):

std::wofstream ofs( "myOutputFile" );
boost::archive::text_woarchive oa( ... );
//... what do I put here? ...
oa << myMainClass;

myMainClass contains wide strings and Boost smart pointers to other classes that, in turn, get serialized.

Is there a way to add your own specialization of the load and save functions for wstring? — bames53, Dec 12 '11 at 22:22
What do you mean? Splitting serialization into separate `load` and `save` functions? I do know how to do that, but I am not exactly certain what type of conversions to perform on wstrings if I were to write those functions. — Tymek, Dec 13 '11 at 02:17
I would go with bames53 idea, write a specialization of the boost::serialization routine for wstring. That way you can choose either 2 or 4 bytes per character and stick with it for both platforms. — fileoffset, Dec 13 '11 at 04:07
@Tymek No, my mention of load and save functions was only incidental. I just meant that you might override the default serialization function with your own code which, for example, converts the wstring to a UTF-8 string for serialization. For example if wstring serialization is implemented via a template, you could create your own template specialization for wstring. — bames53, Dec 13 '11 at 04:13
You have to decide on an exchange format (UTF-8, UTF-16, UTF-16BE, UTF16-LE, UTF32...). — curiousguy, Dec 13 '11 at 16:18
@curiousguy It's UTF-8. I am still trying to figure out how to store data in UTF-8 format. Any examples that work with Boost serialization? — Tymek, Dec 13 '11 at 17:08

score 2 · Answer 1 · answered Dec 13 '11 at 08:22

2

wofstream is typedef basic_ofstream<wchar_t, char_traits<wchar_t> > wofstream;

on linux, you need to declare a custom ofstream to deal with 16-bit characters (on linux). This can be done as follows:

typedef std::uint16_t Char16_t;
typedef basic_ofstream<Char16_t, char_traits<Char16_t> > wofstream_16;

Now wofstream_16 can be used seamlessly on different platforms to deal with 16-bit wide chars.

answered Dec 13 '11 at 08:22

vine'th

4,890
2
27
27

Thanks vine'th. I tried to add this but I am getting compilation issues: `error: no matching function for call to boost::archive::text_woarchive::text_woarchive(SaveSession()::wofstream_16&)` ... `candidates are: boost::archive::text_woarchive::text_woarchive(std::wostream&, unsigned int)` ... `boost::archive::text_woarchive::text_woarchive(const boost::archive::text_woarchive&)` Any ideas? – Tymek Dec 13 '11 at 17:09
I think you need to add an overloaded << operator to `myMainClass`, which accepts `wofstream_16&` (Guessing, tough to pin point the problem without the src.) HTH. – vine'th Dec 15 '11 at 07:43

score 0 · Answer 2 · answered Jul 30 '12 at 17:32

There is a simple solution to this that works for me. It was just a matter of understanding these statements in the official documentation and turning them into C++ syntax:

Open a wide character stream.

Alter the stream locale to use boost::archive::codecvt_null

Create the archive with the flag no_codecvt.

So everything together looks like this (output to file):

#include <fstream>
#include <locale>

#include <boost/archive/codecvt_null.hpp>
#include <boost/archive/text_woarchive.hpp>
#include <boost/archive/text_wiarchive.hpp>

// (1)
std::wofstream ofs( "myOutputFile.dat" );

// (2)
std::locale loc( ofs.getloc(), new boost::archive::codecvt_null<std::ostream::char_type>() );
ofs.imbue( loc );

// (3) (note text_woarchive)
boost::archive::text_woarchive oa( ofs, boost::archive::no_codecvt );

oa << myMainClass;

The same idea would apply for file input:

std::wifstream ifs( "myInputFile.dat" );

std::locale loc( ifs.getloc(), new boost::archive::codecvt_null<std::ostream::char_type>() );
ifs.imbue( loc );

boost::archive::text_wiarchive ia( ifs, boost::archive::no_codecvt );

ar >> myMainClass;

The output files on both platforms are now identical and stored as UTF8.

Boost C++ cross-platform (Windows & Mac) serialization of std::wstring

2 Answers2