2

I'm using the Shark machine learning library, and it outputs its classifiers to file by using the boost::archive::polymorphic_text_(io)archive classes.

I'm creating a bag of words model that I also need to write to file (using my own code) and I need to also output that to file.

I would ideally like to output this to the same file as the classifier. Is is possible to write things to the same file as when a polymorphic text archive is used? Is it enough to just pass the fstream at the point the archive begins?

Edit: Just to be slightly clearer: Does Boost support me putting other things in a file alongside these archives?

AdmiralJonB
  • 2,038
  • 3
  • 23
  • 27

1 Answers1

2

First Off: Streams Are Not Archives.

My first reaction would be "have you tried". But, I was intrigued and couldn't find anything about this in the documentation, so I did a few tests myself:

  • the answer seems to be "No", it's not supported
  • it seems to work for binary archives
  • it seems to break down because the xml/text archives leave trailing 0xa characters in the input buffer. These will not pose a problem if the "next" archive to be read is text as well, but obviously break binary archives.

Here's my tester:

Live On Coliru

#include <boost/archive/binary_iarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>

int data = 42;

template <typename Ar>
void some_output(std::ostream& os)
{
    std::cout << "Writing archive at " << os.tellp() << "\n";
    Ar ar(os);
    ar << BOOST_SERIALIZATION_NVP(data);
}

template <typename Ar>
void roundtrip(std::istream& is)
{
    data = -1;
    std::cout << "Reading archive at " << is.tellg() << "\n";
    Ar ar(is);
    ar >> BOOST_SERIALIZATION_NVP(data);
    assert(data == 42);
}

#include <sstream>

int main()
{
    std::stringstream ss;

    //some_output<boost::archive::text_oarchive>(ss); // this derails the binary archive that follows
    some_output<boost::archive::binary_oarchive>(ss);
    some_output<boost::archive::xml_oarchive>(ss);
    some_output<boost::archive::text_oarchive>(ss);

    //roundtrip<boost::archive::text_iarchive>(ss);
    roundtrip<boost::archive::binary_iarchive>(ss);
    roundtrip<boost::archive::xml_iarchive>(ss);
    roundtrip<boost::archive::text_iarchive>(ss);

    // just to prove that there's remaining whitespace
    std::cout << "remaining: ";
    char ch;
    while (ss>>std::noskipws>>ch)
        std::cout << " " << std::showbase << std::hex << ((int)(ch));
    std::cout << "\n";

    // of course, anything else will fail:
    try {
        roundtrip<boost::archive::text_iarchive>(ss);
    } catch(boost::archive::archive_exception const& e)
    {
        std::cout << "Can't deserialize from a stream a EOF: " << e.what();
    }
}

Prints:

Writing archive at 0
Writing archive at 44
Writing archive at 242
Reading archive at 0
Reading archive at 44
Reading archive at 240
remaining:  0xa
Reading archive at 0xffffffffffffffff
Can't deserialize from a stream a EOF: input stream error
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks, great answer! I hadn't got around to testing it, but I couldn't find any real documentation about how it stored in the file so I wasn't too sure how to go about testing it. Also, never heard of Coliru before, seems like a pretty good tool. – AdmiralJonB Dec 11 '14 at 15:09