How C++ heap managers track the size of allocated objects?

Question

I want to estimate memory consumption in C++ when I allocate objects in the heap. I can start my calculation with sizeof(object) and round it up to the nearest multiple of the heap block (typically 8 bytes). But if the whole allocated block goes to the allocated object, how can the heap manager tell the size of the object when I ask it to delete the pointer?

If the heap manager tracks the size of each object, does it mean that I should add ~4 bytes per allocated object in my calculation to the total memory consumption for heap manager internal expenses? Or does it store this information in much more compact form? What are extra costs (memory-wise) of heap memory allocation?

I understand that my question is very implementation specific, but I appreciate any hints about the heap metadata storage of major implementations such as gcc (or maybe it is about libc).

Extremely related: https://stackoverflow.com/q/197675/1896169 — Justin, Apr 11 '18 at 23:24
Probably related: https://stackoverflow.com/questions/3479330/how-is-malloc-implemented-internally — eerorika, Apr 11 '18 at 23:29
@Justin: Half the story. There's also `delete basePtr`, which may call `Derived::~Derived`. In that case `::operator delete` will need `sizeof(Derived)`, not `sizeof(Base)`. — MSalters, Apr 11 '18 at 23:30
@Justin so should I expect +4-8 bytes per object for heap manager expenses in real production implementation? That's twice the required amount for small object! How came that it is not optimized? — Fyodor Menshikov, Apr 11 '18 at 23:46
@MSalters The size information is probably just associated with the pointer, not the objects and sub-objects. — Barmar, Apr 11 '18 at 23:48
@FyodorMenshikov The answers to that question I linked are not entirely accurate. The best malloc/free implementations store the metadata separately. But yes, there will be somewhere around 4-8 bytes per allocation. Usually you don't want to allocate single small objects, but large arrays, or larger objects. For small objects, you often don't even need to heap-allocate and can simply pass them by value — Justin, Apr 11 '18 at 23:48
@FyodorMenshikov It's possible to optimize, e.g. by using different heaps for different object sizes. So in a heap for objects < 256 bytes, only 1 byte would be needed for size info. But if you don't have dynamic allocation of lots of small objects, it might not be worth the added complexity. — Barmar, Apr 11 '18 at 23:52
@Justin thank you! If you can add some references to implementations description or maybe specific source code parts, it could be full answer to my question. — Fyodor Menshikov, Apr 11 '18 at 23:52
@Barmar I understand that something like this is possible in theory, but my question is about parameters of real major implementations. — Fyodor Menshikov, Apr 11 '18 at 23:55
Often when using C++, the compiler will create little glue structures for you. These 4-12 bytes allocations can add up. To combat this, often a Unit Heap Allocator is used. Basically create a stack of a bunch of fixed sized allocations that has no explicit overhead per allocation. The problem being that each block is a fixed size - say 64 bytes. If you need less than the block size, you incur internal fragmentation. The good news is this sort of allocator runs at O(1) time for alloc and free, where others often run at O(n) where n is the number current allocations. — Michael Dorgan, Apr 12 '18 at 00:05
@FyodorMenshikov: "*my question is about parameters of real major implementations*" - then you should actually look at those implementations for yourself. Every implementation is different. — Remy Lebeau, Apr 12 '18 at 00:10
@MichaelDorgan I highly doubt that complexity of a production heap manager can be O(n) because they are created to manage millions of objects. As about Unit Heap Allocator, that's very valuable information. Could you reference some article or maybe specific source code? — Fyodor Menshikov, Apr 12 '18 at 00:10
Typing an answer for you now. And yes, O(n) is not unusual. I've fought the `malloc` battle many many times over the years when optimizing code. A simple linked list type tracking system is not unusual. — Michael Dorgan, Apr 12 '18 at 00:12
@RemyLebeau I understand that implementations can differ, but maybe there are "best practices" among implementations? Maybe heap manager is not "production ready" unless it does this this and this? — Fyodor Menshikov, Apr 12 '18 at 00:16
@MichaelDorgan in my opinion, O(n) per allocation is not compatible with such C++ STL data structures as std::set and std::map. They have guaranteed O(logN) time of element removal. Or is there special note in C++ standard that these structures can have worse complexity due to inadequate heap manager? — Fyodor Menshikov, Apr 12 '18 at 00:22
@FyodorMenshikov if a memory manager is asked to allocate X bytes and it returns a pointer to X bytes, and then is asked to free the memory so it frees X bytes, it is production ready. There are no "best practices". HOW it allocates and frees those X bytes, and HOW it keeps track of how many bytes were allocated, is completely up to the implementor to decide. There are certain *optimizations* to reduce overhead (multiple heaps, metadata headers in front of allocated blocks, etc), but those are not *required*. — Remy Lebeau, Apr 12 '18 at 00:22
@FyodorMenshikov `O(n)` complexities on additions and removals are related to how a container organizes, tracks, and traverses its data items, not how they are physically allocated in memory. — Remy Lebeau, Apr 12 '18 at 00:23
@RemyLebeau could you cite the C++ Standard where it says something like the complexity of insert method of std::map is O(logN) _plus_ unspecified complexity of malloc? This kind of statements look very suspicious to me. Everywhere in the description of such data structures allocation is assumed O(1). — Fyodor Menshikov, Apr 12 '18 at 00:31

Michael Dorgan · Answer 1 · 2018-04-12T00:42:18.607

Heap allocators are not free. There is the cost per block of allocation (both in size and perhaps search if using a find best algorithm), the cost of joining blocks upon free, and any lost size per block when the size requested is smaller than the block size returned. There is also a cost in terms of memory fragmentation. Consider placing a small 1 byte allocation in the middle of your heap. At this point, you can no longer return a contiguous block larger than 1/2 the heap - half your heap is fragmented. Good allocators fight all the above and try to maximize all benefits.

Consider the following allocation system (used in many real world applications over a decade on numerous handheld game devices.)

Create a primary heap where each allocation has a prev ptr, next ptr, size, and possibly other info. Round it to 16 bytes per entry. Store this info before or after the actual memory pointer being returned - your choice as there are advantages to each. Yes, you are allocating requested size + 16 bytes here.

Now just keep a pointer to a free list and possibly a used list.

Allocation is done by finding a block on the free list big enough for our use and splitting it into the size requested and the remainder (first fit), or by searching the whole list for as exact a match as possible (best fit). Simple enough.

Freeing is moving the current item back into the free list, joining areas adjacent to each other if possible. You can see how this can go to O(n).

For smaller allocations, get a single allocation (either from newly created heap, or from global memory) that will be your unit allocation zone. Split this area up into "block size" chunk addresses and push these addresses onto a free stack. Allocation is popping an address from this list. Freeing is just pushing the allocation back to the list - both O(1).

Then inside your malloc/new/etc, check size if it is within the unit size, allocate from the unit allocator, otherwise use the O(n) allocator. My studies have shown that you can get 90-95% of allocations to fit within the unit allocator's block size without too much issue.

Also, you can allocate slabs of memory for memory pools and leave them allocated while using them over and over. A few larger allocations are much cheaper to manage (Unix systems use this alot...)

Advantages:

All unit allocations have no external fragmentation.
Small allocations run constant time.
Larger allocation can be tailored to the exact size of the data requested.

Disadvantages:

You pay an internal fragmentation cost when you want memory smaller than the block requsted.
Many, many allocations and frees outside your small block size will eventually fragment your memory. This can lead to all sorts of unpleasantness. This can be managed with OS help or with care on alloc/free order.
Once you get 1000s of large allocations going, there is a CPU price. Slabs / multiple heaps can be used to fix this as well.

There are many, many schemes out there, but this one is simple and has been used in commercial apps for ages though, so I wanted to start here.

I understand your point that in some situations manual memory management has its benefits. I used one when I worked in gamedev. But my primary focus in this question is on PC (=enough resources) heap manager characteristics "out of the box". What to expect from it? — Fyodor Menshikov, Apr 12 '18 at 00:41
https://softwareengineering.stackexchange.com/questions/193575/performance-overhead-of-standard-containers-and-boost Look at the glibc malloc code. Look at other allocators. They all use the same tricks. Some rely on the OS to hand them virtual memory chunks big enough for their purposes and let the OS handle fragmentation and physical mapping. — Michael Dorgan, Apr 12 '18 at 00:45
And `PC=enough resources` I have to admit made me giggle, though I understand your meaning. — Michael Dorgan, Apr 12 '18 at 00:46

How C++ heap managers track the size of allocated objects?

1 Answers1