How to write a thread-safe and efficient, lock-free memory allocator in C?

Question

How to write a thread-safe and efficient, lock-free memory allocator in C? By efficient I mean:

Fast allocation & deallocation
Optimal memory usage (minimal wastage and no external fragmentation)
Minimal meta-data overhead

score 13 · Accepted Answer · edited Sep 30 '16 at 15:07

13

http://www.research.ibm.com/people/m/michael/pldi-2004.pdf

This paper presents a completely lock-free memory allocator. It uses only widely-available operating system support and hardware atomic instructions. It offers guaranteed availability even under arbitrary thread termination and crash-failure, and it is immune to dead-lock regardless of scheduling policies, and hence it can be used even in interrupt handlers and real-time applications without requiring special scheduler support. Also, by leveraging some high-level structures from Hoard, our allocator is highly scalable, limits space blowup to a constant factor, and is capable of avoiding false sharing...

edited Sep 30 '16 at 15:07

gnat

6,213
108
53
73

answered Jan 03 '10 at 19:11

oz10

153,307
27
93
128

1

This paper is the first thing I thought of when I saw the question. We used a variation of this allocator in one of our products and it was really very helpful. – Dan Olson Jan 03 '10 at 20:14
Thanks Dan. This sounds great! So I got confidence to improve on it. – Viet Jan 04 '10 at 01:38
3

Be aware that there are bugs in the paper as published. At minimum, the algorithm should be model checked. I hope this has been done since 2004, but I don't know. – Norman Ramsey Jan 04 '10 at 01:52
@Norman Ramsey: +1 you are exactly right... the paper should be used as a starting point. – oz10 Jan 04 '10 at 16:10
+1 very interesting link. Do you have links for the benchmarks they used in the paper? – R.. GitHub STOP HELPING ICE Jul 12 '11 at 20:59
Updated link [here](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.3870&rep=rep1&type=pdf). – KeyC0de Oct 03 '19 at 17:24

score 4 · Answer 2 · answered Jan 03 '10 at 18:58

Depends on what you mean by efficiency. If my concern was to make things fast, then I would probably give each thread it's own separate memory pool to work with, and a custom 'malloc' that took memory from that pool. Of course, if my concern was speed, I would probably avoid allocation in the first place.

There is no one answer; you'll be balancing a range of concerns. It will be pretty much impossible to get a lock-free allocator, but you can either do the locking early and infrequently ( by allocating large pools for each thread ) or you can make the locks so small and tight that they must be correct.

Note that per-thread pools completely fail with producer-consumer model. — R.. GitHub STOP HELPING ICE, Jul 12 '11 at 21:00

Christopher · Answer 3 · 2010-01-03T20:09:35.630

You can use a lock free list and a couple of buckets of differing sizes.

So :

typedef struct
{
    union{
        SLIST_ENTRY entry;
    void* list;
};
byte mem[];
} mem_block;

typedef struct
{
    SLIST_HEADER root;
} mem_block_list;

#define BUCKET_COUNT 4
#define BLOCKS_TO_ALLOCATE 16

static mem_block_list Buckets[BUCKET_COUNT];

void init_buckets()
{
    for( int i = 0; i < BUCKET_COUNT; ++i )
    {
        InitializeSListHead( &Buckets[i].root );
        for( int j = 0; j < BLOCKS_TO_ALLOCATE; ++j )
        {
            mem_block* p = (mem_block*) malloc( sizeof( mem_block ) + (0x1 << BUCKET_COUNT) * 0x8 );
            InterlockedPushEntrySList( &Buckets[i].root, &p->entry );
        }
    }
}

void* balloc( size_t size )
{
    for( int i = 0; i < BUCKET_COUNT; ++i )
    {
        if( size <= (0x1 << i) * 0x8 )
        {
            mem_block* p = (mem_block*) InterlockedPopEntrySList( &Buckets[i].root );
            p->list = &Buckets[i];
        }
    }

    return 0;   // block to large
}

void  bfree( void* p )
{
    mem_block* block = (mem_block*) (((byte*)p) - sizeof( block->entry ));
    InterlockedPushEntrySList( ((mem_block_list*)block)->root, &block->entry );
}

SLIST_ENTRY, InterlockedPushEntrySList, InterlockedPopEntrySList, InitializeSListHead are functions for lock-free single-linked-list operations under Win32. Use the according operations on other OSes.

Drawbacks :

Overhead of sizeof( SLIST_ENTRY )
The buckets can only grow efficiently once at the start, after that you can run out of memory and have to ask the OS/other buckets. (Other buckets leads to fragmentation)
This sample is a bit too easy and must be extended to handle more cases

I wanted to write in C. Thanks anyway. – Viet Jan 03 '10 at 19:38 — Viet, Jan 03 '10 at 19:38

How to write a thread-safe and efficient, lock-free memory allocator in C?

3 Answers3

Linked