3

I am trying to implement a boost::multi_index application and performance is very bad: inserting 10,000 objects takes almost 0.1 second and it is not acceptable. So when I look into the documentation and found that boost::multi_index could accept a memory allocator as the last parameter but I got a lot of compiling errors when I tried to implement myself. Please help me correct. Thanks.

struct order
{
    unsigned int    id;
    unsigned int    quantity;
    double          price;
};

struct id{};
struct price{};

typedef multi_index_container<
  order,
  indexed_by<
    hashed_unique<
      tag<id>,  BOOST_MULTI_INDEX_MEMBER(order, unsigned int, id)>,
    ordered_non_unique<
      tag<price>,BOOST_MULTI_INDEX_MEMBER(order ,double, price),
        std::less<double> >
  >,
  boost::object_pool<order>
> order_sell; 

In general, compiler does not like the expression of boost::object_pool as an allocator in order_sell declaration.

Joaquín M López Muñoz
  • 5,243
  • 1
  • 15
  • 20
loudking
  • 115
  • 4

3 Answers3

2

Let me re-state Alexander's suggestion that you profile your program in order to understand where the problem really lies. I strongly doubt Boost.MultiIndex per se can be as slow as you say. The following program measures the time taken to create an order_sell container (without Boost.Pool), populate it with 10,000 random orders and destroy it:

Live Coliru Demo

#include <algorithm>
#include <array>
#include <chrono>
#include <numeric> 

std::chrono::high_resolution_clock::time_point measure_start,measure_pause;

template<typename F>
double measure(F f)
{
  using namespace std::chrono;

  static const int              num_trials=10;
  static const milliseconds     min_time_per_trial(200);
  std::array<double,num_trials> trials;
  volatile decltype(f())        res; /* to avoid optimizing f() away */

  for(int i=0;i<num_trials;++i){
    int                               runs=0;
    high_resolution_clock::time_point t2;

    measure_start=high_resolution_clock::now();
    do{
      res=f();
      ++runs;
      t2=high_resolution_clock::now();
    }while(t2-measure_start<min_time_per_trial);
    trials[i]=duration_cast<duration<double>>(t2-measure_start).count()/runs;
  }
  (void)res; /* var not used warn */

  std::sort(trials.begin(),trials.end());
  return std::accumulate(
    trials.begin()+2,trials.end()-2,0.0)/(trials.size()-4);
}

void pause_timing()
{
  measure_pause=std::chrono::high_resolution_clock::now();
}

void resume_timing()
{
  measure_start+=std::chrono::high_resolution_clock::now()-measure_pause;
}

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/member.hpp>

using namespace boost::multi_index;

struct order
{
    unsigned int    id;
    unsigned int    quantity;
    double          price;
};

struct id{};
struct price{};

typedef multi_index_container<
  order,
  indexed_by<
    hashed_unique<
      tag<id>,BOOST_MULTI_INDEX_MEMBER(order, unsigned int, id)>,
    ordered_non_unique<
      tag<price>,BOOST_MULTI_INDEX_MEMBER(order ,double, price),
        std::less<double> >
  >
> order_sell; 

#include <iostream>
#include <random>

int main()
{
  std::cout<<"Insertion of 10,000 random orders plus container cleanup\n";
  std::cout<<measure([](){
    order_sell os;
    std::mt19937                                gen{34862};
    std::uniform_int_distribution<unsigned int> uidist;
    std::uniform_real_distribution<double>      dbdist;

    for(unsigned int n=0;n<10000;++n){
      os.insert(order{uidist(gen),0,dbdist(gen)});
    }
    return os.size();
  })<<" seg.\n";
}

When run in -O3 mode with whatever backend Coliru uses, we get:

Insertion of 10,000 random orders plus container cleanup
0.00494657 seg.

VS 2015 release mode in my machine (Intel Core i5-2520M @2.50GHz) yields:

Insertion of 10,000 random orders plus container cleanup
0.00492825 seg.

So, this is around 20 times faster than you report, and I'm including container destruction and random number generation in my measurements.

A couple of additional observations:

  • boost::object_pool does not provide the allocator interface the standard library specifies for interoperability with containers. You might want to use boost::pool_allocator instead (I've played with it a bit and doesn't seem to improve speed, though, but your mileage may vary).
  • John's answer seems to imply that Boost.MultiIndex is suboptimal in the sense that it allocates nodes separately from the values or something like that. In fact, the library is as efficient as it can possibly get in terms of memory allocation and you can't do better with Boost.Intrusive (you can get just the same, actually). Take a look at this answer of mine if you're curious about how Boost.MultiIndex internal data structures look like. In particular, for your order_sell container with a hashed index and an ordered index, each value goes into one node of its own, plus you have a separate so-called bucket array (an array of pointers) with a length roughly the same as the number of elements. You can't get better than that for a node-based data structure (if you want to dispense with iterator stability you could devise more memory-efficient schemes).
Community
  • 1
  • 1
Joaquín M López Muñoz
  • 5,243
  • 1
  • 15
  • 20
0

You can't or shouldn't do this for a couple reasons.

First, boost::object_pool has a performance problem: releasing objects from it is O(N). If you want to do this efficiently you need to implement your own allocator on top of boost::pool directly. The reason is that object_pool uses the "ordered free" semantics, which you don't want for your use case. For more details on this performance bug, see here: https://svn.boost.org/trac/boost/ticket/3789

Second, multi_index_container actually needs to allocate a few different things, depending on the indexes you select. It's not enough to be able to allocate value_type, it needs to allocate tree nodes etc. This makes it wholly unsuitable for use with a pool allocator, because a pool allocator usually assumes many instances of a single type (or at least a single size).

If you want the very best performance, you may need to "roll your own." Boost MIC and Boost Pool certainly won't play well together. But another idea is to use a higher-performance general-purpose allocator, such as tcmalloc: http://goog-perftools.sourceforge.net/doc/tcmalloc.html

You might consider Boost Intrusive, which has containers that are well-suited to pooled allocation. You could add hooks to your order type to enable storing them in ordered and unordered maps, and then you could allocate the orders in a boost::pool.

Finally, since it appears you are storing financial data, you should know that using double to store prices is hazardous. For more on that, see here: Why not use Double or Float to represent currency?

Community
  • 1
  • 1
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • I think you're making some inaccurate assumptions on the internal data structure of Boost.MultIndex. Please see [my answer](http://stackoverflow.com/a/37098035/213114) for further details. – Joaquín M López Muñoz May 08 '16 at 08:59
  • @JoaquínMLópezMuñoz: Would you agree if I phrased it this way? "Boost MIC allocates objects whose type you do not know and whose size is not the same as `value_type`." – John Zwinck May 08 '16 at 14:23
  • I agree with this latter statement, but I don't see how this implies that the lib is wholly unsuitable for use with a pool allocator, or that Boost MIC and Boost Pool certainly won't play well together: just plug `boost::pool_allocator` into the definition of `order_sell` and check that it works (I did). You also say that Boost.Intrusive, in comparison, is well-suited to pool allocation: well, laying out a Boost.Intrusive-based container with an `unordered_set` hook and a `multiset` hook will give you the same layout as a Boost.MultiIndex-based `order_sell` with `boost::pool_allocator`. – Joaquín M López Muñoz May 08 '16 at 14:55
  • Let me guess that your thinking is that any non-intrusive container (such as Boost-MultiIndex but also all node-based standard library containers such as `std::set` and `std::unordered_set`) is not amenable to pool allocation because the type and size of the internal nodes does not coincide with `value_type`. This is certainly the case, but not really a problem as the container internally gets the proper allocator instance via `allocator::rebind`. I understand this can pose some problems as you can't declare the pool or intantiate explicitly in your code, right? Is this what you have in mind? – Joaquín M López Muñoz May 08 '16 at 15:01
  • Right: I create an `object_pool` for example, but the MIC wants to allocate `sizeof(order) + 32`, and my pool may not support that. – John Zwinck May 08 '16 at 15:29
  • Yes, there's no way a non-intrusive node-based container can be made to use your `object_pool`, but via `pool_allocator::rebind` it'll end up instantiating an `object_pool` (visible only to the container). – Joaquín M López Muñoz May 08 '16 at 15:43
  • Mind, I understand intrusive containers are much more flexible than non-intrusive ones in terms of memory allocation, object lifetime tracking etc. (this is why Boost.Intrusive is for, to begin with), I'm only saying that a) non-intrusive containers can be made to work with pooled allocation and b) a `multi_index_container` has the same node structure as the equivalent multi-hook Boost.Intrusive-based container. – Joaquín M López Muñoz May 08 '16 at 15:44
0

The first thing you need to do (as always in case of performance bottlenecks) - is to profile!

It might turn out (and probably will) that allocation is not your worst thing.