the fastest way to load data in C++

Question

There is a big object as an input in my program and i don't want to initialize it every time. So I have tried the Boost to serialize it(the object is 13.6 MB after serialized). But the performance is not very good. It still need about one minute to load and deserialize it. So I wonder is there any mehtod to make this process faster? I appreaciate any hints or suggestion! thank you in advance

this is the save function:

void mysave(){
    dataprepocess dp();//dp is the object i want to save and load
    ofstream ofs("dp.dat", ios::binary);
    {
        boost::archive::binary_oarchive oa(ofs);
        // write class instance to archive
        oa << dp;
        // archive and stream closed when destructors are called
    }
    cout<<"saving finished"<<endl;
}

here is my load function,

void myload(dataprepocess& dp){
  ifstream ifs1("dp_b.dat", ios::binary);
  {
    boost::archive::binary_iarchive ia1(ifs1);
    ia1 >> dp;
  }
  ifs1.close();
}

I have tried both text_archive and binary_arhive and it prove they don't have much difference in performance.

with the extreme lack of details it's difficult to say more than, precompute and measure. precompute stuff. measure where the time is spent (i/o? dynamic allocation? linking up things?). — Cheers and hth. - Alf, Jul 10 '14 at 04:09
You can give the illusion of speed by providing responsiveness in the thread that handles the user interaction and doing the serialization/unserialization in a background thread. — , Jul 10 '14 at 04:46

score 2 · Accepted Answer · edited May 23 '17 at 12:29

speed comparisons here (how to do performance test using the boost library for a custom library)
size trade-offs Boost C++ Serialization overhead (also with compression)
EOS Portable Archive (EPA) for portable binary archives

That said, deserialization can be slow, depending on the types deserialized. Speed depends on a lot of factors, quite possibly unrelated to the serialization library used.

Some data structures have costly insertion performance characteristics (see if you can reserve capacity/load with hints etc)
you might have a lot of dynamic allocation (consider trying e.g. Boost's flat_map for contiguous storage, or load unsorted and sort data when load is completed etc.)
you might have non-inlined (virtual) dispatching - prefer loading/store POD types in simple containers

You will have to profile your code to find out what is the performance bottleneck.

bazza · Answer 2 · 2014-07-10T06:57:06.697

0

You've not provided much info to go on, so only a general answer can be given.

So far as I can tell Boost likes to serialise to XML, text, or a non-portable binary archives. Of the three I'd guess that the binary archive is fastest but looks like it cannot be passed reliably from one computer to another (Boost describes it as non-portable). Binary serialisations normally are faster than text ones like XML which are always woefully slow in comparison.

So if you've gone and used the XML archive format you might get a speed improvement by switching to the binary archive format.

If you need the archive to be portable then you're going to have to ditch Boost and use something else.

Google protocol buffers spring to mind. Free, portable and binary (so maybe fast).

ASN.1 is a very good choice in my opinion because there are a whole range of binary representations that are portable and the schema language is superior to that of Google's Protocol Buffers. Good ASN.1 tools aren't free though.

edited Jul 10 '14 at 06:57

answered Jul 10 '14 at 06:08

bazza

7,580
15
22

I was gonna downvote this for FUD-ding ("Boost likes to" and "you're going to have to ditch boost"). But, well, the rest of the answer is informative, even if a bit biased +0 – sehe Jul 10 '14 at 06:43
@Sehe, if you'd care to review the links to Boost's own documentation and a link to just one of the myriad of articles about XML bloat (= slow) that I've included in my answer then then you may care to review your own comment. After many years of actually doing this stuff I settled on ASN.1 because of it's speed, portability and maturity even though tools aren't generally free. The schema language is really good, does constraints (unlike GPB) and is available in seemingly lots of different languages ranging from C to C# and Java. – bazza Jul 10 '14 at 07:07
My point is, the XML bloat is irrelevant. My answers to your "review" points were in my answer for about half an hour before your comment. _(The portability and cross-language capabilities are interesting but they don't answer any part of the question)_ – sehe Jul 10 '14 at 07:33
["Currently there is a portable binary archive in the examples directory"](http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/todo.html) – Igor R. Jul 10 '14 at 17:55
@IgorR. the webpage you linked to is titled "Serialization - To Do", as in Not Done Yet, as in Doesn't Really Exist. Indeed the page even says that the one in the examples directory is incomplete especially when it comes to floating point numbers. Ok, there's clearly something there but not even the Boost guys are claiming that it's finished and ready to go. – bazza Jul 10 '14 at 22:45
@sehe, sorry sehe I didn't see your answer at the time of my last reply. I also see the OP has since added some code to the original question and has already tried the binary archive format in boost. You've clearly done a lot of work with Boost serialisation! I see from your links to your other answers that text and xml are indeed slower than binary as I'd suspected. I doubt that there's a significant speed advantage in any other binary serialiser over Boost's, and your suggestions are as good as anything is going to get. – bazza Jul 10 '14 at 23:14
@bazza it doesn't support float/double, but other than that it's a working archive - I used it in several projects. – Igor R. Jul 11 '14 at 06:14

the fastest way to load data in C++

2 Answers2