Heap allocators are not free. There is the cost per block of allocation (both in size and perhaps search if using a find best algorithm), the cost of joining blocks upon free, and any lost size per block when the size requested is smaller than the block size returned. There is also a cost in terms of memory fragmentation. Consider placing a small 1 byte allocation in the middle of your heap. At this point, you can no longer return a contiguous block larger than 1/2 the heap - half your heap is fragmented. Good allocators fight all the above and try to maximize all benefits.
Consider the following allocation system (used in many real world applications over a decade on numerous handheld game devices.)
Create a primary heap where each allocation has a prev ptr, next ptr, size, and possibly other info. Round it to 16 bytes per entry. Store this info before or after the actual memory pointer being returned - your choice as there are advantages to each. Yes, you are allocating requested size + 16 bytes here.
Now just keep a pointer to a free list and possibly a used list.
Allocation is done by finding a block on the free list big enough for our use and splitting it into the size requested and the remainder (first fit), or by searching the whole list for as exact a match as possible (best fit). Simple enough.
Freeing is moving the current item back into the free list, joining areas adjacent to each other if possible. You can see how this can go to O(n).
For smaller allocations, get a single allocation (either from newly created heap, or from global memory) that will be your unit allocation zone. Split this area up into "block size" chunk addresses and push these addresses onto a free stack. Allocation is popping an address from this list. Freeing is just pushing the allocation back to the list - both O(1).
Then inside your malloc/new/etc, check size if it is within the unit size, allocate from the unit allocator, otherwise use the O(n) allocator. My studies have shown that you can get 90-95% of allocations to fit within the unit allocator's block size without too much issue.
Also, you can allocate slabs of memory for memory pools and leave them allocated while using them over and over. A few larger allocations are much cheaper to manage (Unix systems use this alot...)
Advantages:
- All unit allocations have no external fragmentation.
- Small allocations run constant time.
- Larger allocation can be tailored to the exact size of the data requested.
Disadvantages:
- You pay an internal fragmentation cost when you want memory smaller than the block requsted.
- Many, many allocations and frees outside your small block size will eventually fragment your memory. This can lead to all sorts of unpleasantness. This can be managed with OS help or with care on alloc/free order.
- Once you get 1000s of large allocations going, there is a CPU price. Slabs / multiple heaps can be used to fix this as well.
There are many, many schemes out there, but this one is simple and has been used in commercial apps for ages though, so I wanted to start here.