1

This is my current function that de-serializes data received via Boost:Asio UDP transmission. It works perfectly, however the performance is pretty bad. About 4000 or so calls per second will use ~16% of CPU, which is a full thread of an I7.

Running a performance test on the code shows that this line uses >95% of the cpu time:

text_iarchive LLArchive(LLStream);

My question is simple: is there a way I can re-use a text_iarchive without having to create a new one each time the function is called? (a similar thing is possible in C# with memorystreams and other variables needed to deserialise data). I've searched through the Boost documentation and no mention was made of anything like it.

What I essentially want is to put the function bellow in a class and have as many variables as possible defined as members that would simply be used inside the function through re-initialization (clearing buffer/stream, re-setting data etc). Will this even improve the performance? Would changing the stream passed into the archive be enough to do the trick (does it bind it somewhere so that if we change the passed stream, the one the archive sets to itself changes as well) ?

Is it possible?

Thank you very much for your time!

Full function code:

using namespace boost::archive;
using namespace boost::iostreams;

Packet InboundStreamToInternalPacket(boost::array<char, 5000> inboundStream)
{
    Packet receivedPacket; 

    basic_array_source<char> arraySourceLL(inboundStream.data(), inboundStream.size());
    stream<basic_array_source<char>> LLStream(arraySourceLL);
    text_iarchive LLArchive(LLStream);

    LLArchive >> receivedPacket;

    return receivedPacket;
}

Edit 1:

Tried closing and opening the stream again as if a new source was added, crashes with "boost::archive::archive_exception at memory location xxxxxx" when de-serializing into the second Packet.

Packet InboundStreamToInternalPacket(boost::array<char, 5000> inboundStream)
{
    Packet receivedPacket; 
    Packet receivedPacket2;

    basic_array_source<char> arraySourceLL(inboundStream.data(), inboundStream.size());
    stream<basic_array_source<char>> LLStream;     
    LLStream.open(arraySourceLL);

    text_iarchive LLArchive(LLStream);    

    LLArchive >> receivedPacket;

    LLStream.close();

    LLStream.open(arraySourceLL);

    LLArchive >> receivedPacket2;

    return receivedPacket;
}
Tanner Sansbury
  • 51,153
  • 9
  • 112
  • 169
AndrewVS2013
  • 658
  • 1
  • 6
  • 14
  • 1
    _"Will this even improve the performance?"_ - Your profiler will tell you. I bet you've heard about premature optimization – sehe Feb 08 '15 at 15:40
  • 1
    Re.: the edit; You can't do that, this is what I already answered before your edit. You need the archive instance. See my comment at the answer – sehe Feb 08 '15 at 16:51

1 Answers1

2

No there is not such a way.

The comparison to MemoryStream is broken though, because the archive is a layer above the stream.

You can re-use the stream. So if you do the exact parallel of a MemoryStream, e.g. boost::iostreams::array_sink and/or boost::iostreams::array_source on a fixed buffer, you can easily reuse the buffer in you next (de)serialization.

See this proof of concept:

Live On Coliru

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/serialization.hpp>

#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/stream.hpp>
#include <sstream>

namespace bar = boost::archive;
namespace bio = boost::iostreams;

struct Packet {
    int i;
    template <typename Ar> void serialize(Ar& ar, unsigned) { ar & i; }
};

namespace Reader {
    template <typename T>
    Packet deserialize(T const* data, size_t size) {
        static_assert(boost::is_pod<T>::value     , "T must be POD");
        static_assert(boost::is_integral<T>::value, "T must be integral");
        static_assert(sizeof(T) == sizeof(char)   , "T must be byte-sized");

        bio::stream<bio::array_source> stream(bio::array_source(data, size));
        bar::text_iarchive ia(stream);
        Packet result;
        ia >> result;

        return result;
    }

    template <typename T, size_t N>
    Packet deserialize(T (&arr)[N]) {
        return deserialize(arr, N);
    }

    template <typename T>
    Packet deserialize(std::vector<T> const& v) {
        return deserialize(v.data(), v.size());
    }

    template <typename T, size_t N>
    Packet deserialize(boost::array<T, N> const& a) {
        return deserialize(a.data(), a.size());
    }
}

template <typename MutableBuffer>
void serialize(Packet const& data, MutableBuffer& buf)
{
    bio::stream<bio::array_sink> s(buf.data(), buf.size());
    bar::text_oarchive ar(s);

    ar << data;
}

int main() {
    boost::array<char, 1024> arr;

    for (int i = 0; i < 100; ++i) {
        serialize(Packet { i }, arr);

        Packet roundtrip = Reader::deserialize(arr);
        assert(roundtrip.i == i);
    }
    std::cout << "Done\n";
}

For general optimization of boost serialization see:

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • So I tried re-initializing the stream by closing and opening it again and passing the basic_array_source but it doesn't work. Stream is empty once I close it and I can't seem to re-populate it again – AndrewVS2013 Feb 08 '15 at 16:01
  • Edit2: So from what I can see, you've templated the de-serializer and added a few assert checks for the data being passed in. However I can see that the function still defines the stream and text_iarchive. I'm sorry if I'm wrong or expecting something else from your answer! You've mentioned you can re-use the stream, but from what I can see, a new one is still declared with each call of the function. – AndrewVS2013 Feb 08 '15 at 17:24
  • 1
    What do you think the stream is? Beyond a set of pointers into the buffer (and optionally stream state that you don't have to use, such as formatting options/locale)? I don't think it's expensive to construct. And, yes you can re-use the stream too: **[see demo](http://coliru.stacked-crooked.com/a/bcaeec9cc3fa1f81)** but I assumed you wanted to reuse the objects _for performance_ and I'm pretty sure this won't perform better due to the allocations involved. Feel free to correct my assumption (preferably by updating the question wording) – sehe Feb 08 '15 at 17:51
  • 1
    And here's the demo that **[overwrites the same part of the stream each time](http://coliru.stacked-crooked.com/a/63214a453f777b31)**. I don't think it will be noticibly faster. In any case, just avoid the dynamic allocations. Use binary archives. Disable features not required (versioning, type info, headers). Benefit. (**P.S.** the [previous demo](http://coliru.stacked-crooked.com/a/bcaeec9cc3fa1f81) doesn't work out-of-the-box for binary archives, AFAIR) – sehe Feb 08 '15 at 17:55
  • 1
    I've added more links. Here's is the answer where I analyzed why boost binary archives can not be deserialized from a single stream without tweaks: [Outputting more things than a Polymorphic Text Archive](http://stackoverflow.com/questions/27422557/outputting-more-things-than-a-polymorphic-text-archive/27424381#27424381) – sehe Feb 08 '15 at 18:05
  • Thank you for the answers, I now understand! Since I need to de-serialize via text archives, I will just tune the archiver as you suggested. Thanks again! – AndrewVS2013 Feb 08 '15 at 18:48