4

First, here is where I got the idea from:

There was once an app I wrote that used lots of little blobs of memory, each allocated with malloc(). It worked correctly but was slow. I replaced the many calls to malloc with just one, and then sliced up that large block within my app. It was much much faster.

I was profiling my application, and I got a unexpectedly nice performance boost when I reduced the number of malloc calls. I am still allocating the same amount of memory, though.

So, I would like to do what this guy did, but I am unsure what's the best way to do it.

My Idea:

// static global variables
static void * memoryForStruct1 = malloc(sizeof(Struct1) * 10000);
int struct1Index = 0;
...
// somewhere, I need memory, fast:
Struct1* data = memoryForStruct1[struct1Index++];
...
// done with data:
--struct1Index;

Gotchas:

  • I have to make sure I don't exceed 10000
  • I have to release the memory in the same order I occupied. (Not a major issue in my case, since I am using recursion, but I would like to avoid it if possible).

Inspired from Mihai Maruseac:

First, I create a linked list of int that basically tells me which memory indexes are free. I then added a property to my struct called int memoryIndex which helps me return the memory occupied in any order. And Luckily, I am sure my memory needs would never exceed 5 MB at any given time, so I can safely allocate that much memory. Solved.

Community
  • 1
  • 1
Mazyod
  • 22,319
  • 10
  • 92
  • 157
  • 1
    That can be very useful. It can save both time and space if you have certain allocation behavior that you can take advantage of. For example, if you know your memory usage is stack-like, or that you always need chunks of the same size. The best way to handle it depends on these requirements. – Vaughn Cato Sep 25 '12 at 14:05

2 Answers2

5

The system call which gives you memory is brk. The usual malloc and calloc, realloc functions simply use the space given by brk. When that space is not enough, another brk is made to create new space. Usually, the space is increased in sizes of a virtual memory page.

Thus, if you really want to have a premade pool of objects, then make sure to allocate memory in multiples of pagesize. For example, you can create one pool of 4KB. 8KB, ... space.

Next idea, look at your objects. Some of them have one size, some have other size. It will be a big pain to handle allocations for all of them from the same pool. Create pools for objects of various sizes (powers of 2 is best) and allocate from them. For example, if you'll have an object of size 34B you'd allocate space for it from the 64B pool.

Lastly, the remaining space can be either left unused or it can be moved down to the other pools. In the above example, you have 30B left. You'd split it in 16B, 8B, 4B and 2B chunks and add each chunk to their respective pool.

Thus, you'd use linked lists to manage the preallocated space. Which means that your application will use more memory than it actually needs but if this really helps you, why not?

Basically, what I've described is a mix between buddy allocator and slab allocator from the Linux kernel.

Edit: After reading your comments, it will be pretty easy to allocate a big area with malloc(BIG_SPACE) and use this as a pool for your memory.

Mihai Maruseac
  • 20,967
  • 7
  • 57
  • 109
  • I really like your explanation, but I can't figure out how to use it? Should I somehow allocate a large memory through `brk`, then malloc calls will be faster by taking that space? – Mazyod Sep 25 '12 at 14:34
  • Btw, I only have objects of one size. So, it should be very simple. I would like to use your linked list idea, but I can't figure that one, too. – Mazyod Sep 25 '12 at 14:42
  • No, `brk` was just to show that not `malloc` is taking the time. Doing 10 `malloc`s of 1KB is the same as doing 3 `malloc`s of 4KB. – Mihai Maruseac Sep 25 '12 at 14:54
  • 1
    For single larger allocations, often `mmap` is used instead of `brk`. – Seg Fault Sep 25 '12 at 15:59
1

If you can, look at using glib which has memory slicing API that supports this. It's very easy to use, and saves you from having to re-implement it.

unwind
  • 391,730
  • 64
  • 469
  • 606