Minimizing the amount of malloc() calls improves performance?

Question

Consider two applications: one (num. 1) that invokes malloc() many times, and the other (num. 2) that invokes malloc() few times. Both applications allocate the same amount of memory (assume 100MB).
For which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?

It does (have an index of allocated locations) -- how would `free` work otherwise?, but that doesn't need to make the next `malloc` call smaller. If one of the programs has alloced and freed a lot and created fragmentation, that will make the next `malloc` call slower, though, because the free list will be a long chain of blocks, most of them too small. — Pascal Cuoq, Jan 16 '10 at 22:38
One observation is that malloc'ing smaller chunks of memory may be a better approach once memory resources become tight. It may be easier to find a small block of free memory here and there, rather than one giant block somewhere. Not sure how this would affect performance though. — D.C., Jan 16 '10 at 22:38
The malloc/free data structure usually keeps a linked-list of free blocks, and does not usually track allocated blocks. It usually prepends the allocated data with a header. On free it looks in the header to find the size of the allocation, and then add it to the linked list of free blocks. So there is a list (but not index) of free blocks, and nothing keeping track of allocation blocks except the programmer themselves. (Of course, a malloc implementation could do this, and it might be a pretty good way of debugging memory leaks.) — benno, Jan 16 '10 at 23:33

Cheeso · Answer 1 · 2010-01-17T13:56:29.187

You asked 2 questions:

for which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?

You've implied that they are the same question, but they are not. The answer to the latter question is YES.

As for which will be faster, it is impossible to say. It depends on the allocator algorithm, the machine state, the fragmentation in the current process, and so on.

Your idea is sound, though: you should think about how malloc usage will affect performance. There was once an app I wrote that used lots of little blobs of memory, each allocated with malloc(). It worked correctly but was slow. I replaced the many calls to malloc with just one, and then sliced up that large block within my app. It was much much faster.

I don't recommend this approach; it's just an illustration of the point that malloc usage can materially affect performance.

My advice is to measure it.

Sorry to bring an old post, but a question; why is it that you don't recommend this approach? — Fingolfin, Nov 08 '12 at 12:26
I don't recommend it, in general. I recommend keeping things simple. YAGNI. If you're seeing performance problems with memory allocation, by all means, try different approaches and *measure them*. But memory allocation algorithms have improved significantly since the time I had this problem. — Cheeso, Nov 11 '12 at 06:50

score 11 · Accepted Answer · answered Jan 16 '10 at 23:05

Of course this completely depends on the malloc implementation, but in this case, with no calls to free, most malloc implementations will probably give you the same algorithmic speed.

As another answer commented, usually there will be a list of free blocks, but if you have not called free, there will just be one, so it should be O(1) in both cases.

This assumes that the memory allocated for the heap is big enough in both cases. In case #1, you will have allocated more total memory, as each allocation involves memory overhead to store meta-data, as a result you may need to call sbrk(), or equivalent to grow the heap in case #1, which would add an additional overhead.

They will probably be different due to cache and other second order effects, since the memory alignments for the new allocation won't be the same.

If you have been freeing some of the memory blocks, then it is likely that #2 will be faster due to less fragmentation, and so a smaller list of free blocks to search.

If you have freed all the memory blocks, it should end up being exactly the same, since any sane free implementation will have coalesced the blocks back into a single arena of memory.

score 6 · Answer 3 · answered Jan 16 '10 at 22:41

6

Malloc has to run through a linked list of free blocks to find one to allocate. This takes time. So, #1 will usually be slower:

The more often you call malloc, the more time it will take - so reducing the number of calls will give you a speed improvement (though whether it is significant will depend on your exact circumstances).
In addition, if you malloc many small blocks, then as you free those blocks, you will fragment the heap much more than if you only allocate and free a few large blocks. So you are likely to end up with many small free blocks on your heap rather than a few big blocks, and therefore your mallocs may have to search further through the free-space lists to find a suitable block to allocate. WHich again will make them slower.

answered Jan 16 '10 at 22:41

Jason Williams

56,972
11
108
137

+1 heap fragmentation can kill performance if you have lots of small objects on the heap. – pjc50 Feb 05 '10 at 11:33
Regarding the first bullet point: as other answers mention, if you only call malloc (and no free), then time will stay constant in an implementation using a list of free blocks, which seems to be the common case - with occasional hiccups when the heap needs to grow. – hmijail Feb 22 '17 at 10:44
My point was that calling any function 100 times incurs 100 times the overhead of calling that same function once. – Jason Williams Feb 22 '17 at 22:30

asveikau · Answer 4 · 2010-01-16T22:48:31.147

These are of course implementation details, but typically free() will insert the memory into a list of free blocks. malloc() will then look at this list for a free block that is the right size, or larger. Typically, only if this fails does malloc() ask the kernel for more memory.

There are also other considerations, such as when to coalesce multiple adjacent blocks into a single, larger block.

And, another reason that malloc() is expensive: If malloc() is called from multiple threads, there must be some kind of synchronization on these global structures. (i.e. locks.) There exist malloc() implementations with different optimization schemes to make it better for multple threads, but generally, keeping it multi-thread safe adds to the cost, as multiple threads will contend for those locks and block progress on each other.

score 3 · Answer 5 · answered Jan 16 '10 at 23:27

You can always do a better job using malloc() to allocate a large chunk of memory and sub-dividing it yourself. Malloc() was optimized to work well in the general case and makes no assumptions whether or not you use threads or what the size of the program's allocations might be.

Whether it is a good idea to implement your own sub-allocator is a secondary question. It rarely is, explicit memory management is already hard enough. You rarely need another layer of code that can screw up and crash your program without any good way to debug it. Unless you are writing a debug allocator.

Erik Elmgren · Answer 6 · 2010-01-16T22:46:18.963

The answer is that it depends, most of the potential slowness rather comes from malloc() and free() in combination and usually #1 and #2 will be of similar speed.

All malloc() implementations do have an indexing mechanism, but the speed of adding a new block to the index is usually not dependant on the number of blocks already in the index.

Most of the slowness of malloc comes from two sources

searching for a suitable free block among the previously freed(blocks)
multi-processor problems with locking

Writing my own almost standards compliant malloc() replacement tool malloc() && free() times from 35% to 3-4%, and it seriously optimised those two factors. It would likely have been a similar speed to use some other high-performance malloc, but having our own was more portable to esoteric devices and of course allows free to be inlined in some places.

Richard Pennington · Answer 7 · 2010-01-16T23:05:32.353

You don't define the relative difference between "many" and "few" but I suspect most mallocs would function almost identically in both scenarios. The question implies that each call to malloc has as much overhead as a system call and page table updates. When you do a malloc call, e.g. malloc(14), in a non-brain-dead environment, malloc will actually allocate more memory than you ask for, often a multiple of the system MMU page size. You get your 14 bytes and malloc keeps track of the newly allocated area so that later calls can just return a chunk of the already allocated memory, until more memory needs to be requested from the OS.

In other words, if I call malloc(14) 100 times or malloc(1400) once, the overhead will be about the same. I'll just have to manage the bigger allocated memory chunk myself.

score 1 · Answer 8 · answered Jan 17 '10 at 20:27

Allocating one block of memory is faster than allocating many blocks. There is the overhead of the system call and also searching for available blocks. In programming reducing the number of operations usually speeds up the execution time.

Memory allocators may have to search to find a block of memory that is the correct size. This adds to the overhead of the execution time.

However, there may be better chances of success when allocating small blocks of memory versus one large block. Is your program allocating one small block and releasing it or does it need to allocate (and preserve) small blocks. When memory becomes fragmented, there are less big chunks available, so the memory allocator may have to coalesce all the blocks to form a block big enough for the allocation.

If your program is allocating and destroying many small blocks of memory you may want to consider allocating a static array and using that for your memory.

Minimizing the amount of malloc() calls improves performance?

8 Answers8

Linked

Related