1

I have been using cereal in highly time-sensitive software where every microsecond counts. My program runs in a loop and serializes a struct on every iteration. The struct contains some STL containers and strings and thus the size can vary between iterations.

I noticed that cereal takes much longer to complete on the very first serialization, and much less time in subsequent serialization attempts. It took approximately 600 microseconds the first time, then averaged 80 microseconds subsequently.

After tracing through the library I haven't been able to determine what is different about the first attempt versus all others. I'm guessing it has to do with parsing my struct or with allocating memory for the stringstream.

I found this post interesting, in particular the recommendation to extend a cereal class to not use streams. I tried to create a version of the BinaryOutputArchive class that used a void* buffer instead of a std::ostream, but have been unsuccessful getting things to compile. I also tried playing with the rdbuf of the stringstream as suggested here but I could not get it to serialize properly.

Does anyone have a recommendation on how to improve cereal's performance, especially on the very first serialization? Or perhaps a way to achieve deterministic latencies? Am I on the right track with my attempts above?

L. Chang
  • 11
  • 3
  • Have you tried warming it up on app startup before the actual loop starts rolling? – bipll Feb 20 '18 at 01:16
  • @bipll I've considered that, and it would probably solve the issue of the very first serialization (in the loop) taking a long time. However, I want deterministic behavior. What if after some time the memory gets thrashed and cereal performs some operation that takes a long time, the same type of operation that caused the initial serialization to be slow? That's why I want to ensure performance on every attempt. – L. Chang Feb 20 '18 at 01:32
  • Then you probably need to make you own version compile or, alternatively, use a sufficiently large preallocated stringstream buffer. – bipll Feb 20 '18 at 07:14
  • My suggestion would be to investigate a non-stream solution to see if that is the cause of your initial slowdown. Otherwise perhaps you are getting an icache miss in the first go through of the serialization code. Have you tried using something like [cachegrind](http://valgrind.org/docs/manual/cg-manual.html) to debug? For a simple struct with no pointers or polymorphism, cereal should not be tracking or allocating anything extra. – Azoth Mar 12 '18 at 03:13
  • @Azoth I have not tried cachegrind, I can give that a shot. – L. Chang Mar 15 '18 at 04:50

0 Answers0