Multi-threaded Pooled Allocators

Question

I'm having some issues using pooled memory allocators for std::list objects in a multi-threaded application.

The part of the code I'm concerned with runs each thread function in isolation (i.e. there is no communication or synchronization between threads) and therefore I'd like to setup separate memory pools for each thread, where each pool is not thread-safe (and hence fast).

I've tried using a shared thread-safe singleton memory pool and found the performance to be poor, as expected.

This is a heavily simplified version of the type of thing I'm trying to do. A lot has been included in a pseudo-code kind of way, sorry if it's confusing.

/* The thread functor - one instance of MAKE_QUADTREE created for each thread
 */
class make_quadtree
{
private:

/* A non-thread-safe memory pool for int linked list items, let's say that it's 
 * something along the lines of BOOST::OBJECT_POOL
 */
    pooled_allocator<int> item_pool;

/* The problem! - a local class that would be constructed within each std::list as the
 * allocator but really just delegates to ITEM_POOL
 */
    class local_alloc
    {
    public :
    //!! I understand that I can't access ITEM_POOL from within a nested class like
    //!! this, that's really my question - can I get something along these lines to
    //!! work??
        pointer allocate (size_t n) { return ( item_pool.allocate(n) ); }
};

public :
    make_quadtree (): item_pool()    // only construct 1 instance of ITEM_POOL per
                                     // MAKE_QUADTREE object
    {
    /* The kind of data structures - vectors of linked lists
     * The idea is that all of the linked lists should share a local pooled allocator
     */
        std::vector<std::list<int, local_alloc>> lists;

    /* The actual operations - too complicated to show, but in general:
     *
     * - The vector LISTS is grown as a quadtree is built, it's size is the number of
     *   quadtree "boxes"
     *
     * - Each element of LISTS (each linked list) represents the ID's of items
     *   contained within each quadtree box (say they're xy points), as the quadtree
     *   is grown a lot of ID pop/push-ing between lists occurs, hence the memory pool
     *   is important for performance
*/
    }
};

So really my problem is that I'd like to have one memory pool instance per thread functor instance, but within each thread functor share the pool between multiple std::list objects.

I realize you have have chosen an answer. but... If this is Windows, have a look at SmartHeap or HeapAgent from Microquill. It is the best addon lib I have tested or used for this kind of problem. And no, I'm not affiliated with them. — JimR, Jan 07 '11 at 02:40

score 0 · Accepted Answer · answered Jan 07 '11 at 00:59

0

Why not just construct an instance of local_alloc with a reference to make_quadtree?

answered Jan 07 '11 at 00:59

Puppy

144,682
38
256
465

I was thinking about that, but I couldn't figure out how to write a default ctor for local_alloc (as would be called in std::list) that would assign the reference/pointer?? – Darren Engwirda Jan 07 '11 at 01:04
@Darren: It doesn't have to be default-constructed, you can pass an instance in. – Puppy Jan 07 '11 at 01:09
Do you mean that I can pass an instance of the allocator when I construct LISTS?? Will this work since I have a vector of lists rather than just one container?? What about when I resize the vector, do the new lists objects get the right allocator?? Sorry, I don't usually do it this way, I don't think I get it... – Darren Engwirda Jan 07 '11 at 01:28
@Darren: Of course you can. It's a constructor parameter and works the same as any other constructor parameter- you pass it in when you want a new object. – Puppy Jan 07 '11 at 01:43
OK I think I get it, thanks heaps. I was confused about the case where vector.resize() is called. I was worried that it would cause the default ctor to be called (for std::list) but I realise now that there's a second (default) param that can be used to prevent this! – Darren Engwirda Jan 07 '11 at 02:01

score 0 · Answer 2 · edited May 23 '17 at 11:55

A thread specific allocator is quite the challenge.

I spent some time looking for a thread specific allocator "off the shelf". The best I found was hoard ( hoard.org ). This provided a significant performance improvement, however hoard has some serious drawbacks

I experienced some crashes during testing
Commercial licensing is expensive
It 'hooks' system calls to malloc, a technique I consider dodgy.

So I decided to roll my own thread specific memory allocator, based on boost::pool and boost::threadspecificptr. This required a small amount of, IMHO, seriously advanced C++ code, but now seems to be working well.

It is now quite a few months since I looked at the details of this, but perhaps I can take a look at it once more.

Your comment that you are looking for a thread specific but not a thread safe allocator. This makes sense, because if the allocator is thread specific it does not need to be thread safe. However, in my experience, the extra burden of being thread safe is trivial, so long as contention does NOT occur.

However, all that theory is fun, but I think we should move on to practicalities. I believe that we need a small, instrumented stand-alone program that demonstrates the problem you need to solve. I had what sounded like a very similar problem with std::multiset allocation and wrote the program you can see here: Parallel reads from STL containers

If you can write something similar, showing your problem, then I can check if my thread specific memory allocator can be applied with advantage in your case.

I've heard of hoard (and some other threaded allocators), but as far as I understand it (could be wrong!) they still use some kind of thread protection at some stage. I was really trying to setup local non-thread-safe pools... Does BOOST::THREADSPECIFICPOINTER use TLS (__declspec(thread) on Windows), if so I don't think I can use it, since I'm compiling a *.dll — Darren Engwirda, Jan 07 '11 at 01:33
@Darren: Both nedmalloc and tcmalloc do thread cached local arenas as well(using explicit TLS, so its dll safe), but any production level thread cached allocator will enter thread locks atleast once in a while, most of the time to request new memory pages for the local arena, this also allows memory allocated in one thread to be passed around and freed in another — Necrolis, Jan 07 '11 at 04:34

Multi-threaded Pooled Allocators

2 Answers2