Dynamic memory allocation and memory block metadata

Question

I have a question about low level stuff of dynamic memory allocation. I understand that there may be different implementations, but I need to understand the fundamental ideas.

So,

when a modern OS memory allocator or the equivalent allocates a block of memory, this block needs to be freed.

But, before that happends, some system needs to exist to control the allocation process.

I need to know:

How this system keeps track of allocated and unallocated memory. I mean, the system needs to know what blocks have already been allocated and what their size is to use this information in allocation and deallocation process.

Is this process supported by modern hardware, like allocation bits or something like that? Or is some kind of data structure used to store allocation information. If there is a data structure, how much memory it uses compared to the allocated memory?

Is it better to allocate memory in big chunks rather than small ones and why?

Any answer that can help reveal fundamental implementation details is appreciated.

If there is a need for code examples, C or C++ will be just fine.

I am a self taught programmer. I use Google and books. But finding such information will take quite long. There is nothing wrong about taking time to learn something, but there is so much to learn. That is why I started this thread. I am 24. — Vakhtang Tevdorashvili, Jun 03 '13 at 17:41

score 2 · Accepted Answer · edited May 23 '17 at 12:05

"How this system keeps track of allocated and unallocated memory." For non-embedded systems with operating systems, a virtual page table, which the OS is in charge of organizing (with hardware TLB support of course), tracks the memory usage of programs.

AS FAR AS I KNOW (and the community will surely yell at me if I'm mistaken), tracking individual malloc() sizes and locations has a good number of implementations and is runtime-library dependent. Generally speaking, whenever you call malloc(), the size and location is stored in a table. Whenever you call free(), the table entry for the provided pointer is looked up. If it is found, that entry is removed. If it is not found, the free() is ignored (which also indicates a possible memory leak).

When all malloc() entries in a virtual page are freed, that virtual page is then released back to the OS (this also implies that free() does not always release memory back to the OS since the virtual page may still have other malloc() entries in it). If there is not enough space within a given virtual page to support another malloc() of a specified size, another virtual page is requested from the OS.

Embedded processors usually don't have operating systems, virtual page tables, nor multiple processes. In this case, virtual memory is not used. Instead, the entire memory of the embedded processor is treated like one large virtual page (although the addresses are actually physical addresses) and memory management follows a similar process as previously described.

Here is a similar stack overflow question with more in-depth answers.

"Is it better to allocate memory in big chunks rather than small ones and why?" Allocate as much memory as you need, no more and no less. Compiler optimizations are very smart, and memory will almost always be managed more efficiently (i.e. reducing memory fragmentation) than the programmer can manually do. This is especially true in a non-embedded environment.

Here is a similar stack overflow question with more in-depth answers (note that it pertains to C and not C++, however it is still relevant to this discussion).

Great answer, however I'm not sure I understand this part correctly: "tracking individual malloc() sizes and locations has a good number of implementations and is compiler dependent." What does the compiler have to do with malloc() ? Do you mean "allocator dependent" ? Or did I misunderstand you ? — Xaqq, Jun 03 '13 at 17:39
The implementation of dynamic memory allocation can have a huge impact on the system. Will not storing data about every allocation be an overkill? It may halve usable main memory in a worst case. — Vakhtang Tevdorashvili, Jun 03 '13 at 17:59
@Xaqq I think you are right and I was mistaken when I said "compiler dependent". I should have said ["runtime-library dependent"](http://stackoverflow.com/questions/9334957/malloc-in-an-embedded-system-without-an-operating-system), and I will edit that soon. — Suedocode, Jun 03 '13 at 18:59
@VakhtangTevdorashvili The system needs to track where all of the memory blocks are located. The actual implementations of allocators are buried deep in the standard library in a convoluted way that isn't meant for humans to understand, but the system must know exactly what block to free (size and location, exactly 2 words per allocation total) when you free it. *According to my explanation which may be flawed*, effective memory may be worse than half if you keep allocating single words onto the heap, but generally speaking that's really rare and really bad. — Suedocode, Jun 03 '13 at 19:09
@VakhtangTevdorashvili Additionally, I would not be surprised if certain repetitive or permanent allocations are optimized by the compiler to significantly reduce the overhead; compilers are very smart, and should not be underestimated. I believe for most applications it is unnecessary and prone to bugs, but manually managing a larger malloc in the place of much smaller mallocs may increase speed, but I have not tested the gains myself. — Suedocode, Jun 03 '13 at 19:20
Thank you for replies. Additional reference helped me a lot. — Vakhtang Tevdorashvili, Jun 04 '13 at 05:23

score 1 · Answer 2 · answered Jun 03 '13 at 17:09

Well, there are more than one way to achieve that. I once had to wrote a malloc() (and free()) implementation for educational purpose.

This is from my experience, and real world implementation surely vary.

I used a double linked list. Memory chunk returned to the user after calling malloc() were in fact a struct containing relevant information to my implementation (ie the next and prev pointer, and a is_used byte).

So when a user request N bytes I allocated N + sizeof(my_struct) bytes, hiding next and prev pointers at the begenning of the chunk, and returning what's left to the user.

Surely, this is poor design for a program that use a lot of small allocation (because each allocation takes up to N + 2 pointers + 1 byte).

For a real world implementation, you can take a look to the code of good and well known memory allocator.

score 1 · Answer 3 · answered Jun 03 '13 at 20:24

Normally there exist two different layers.

One layer lives at application level, usually as part of the C standard library. This is what you call through functions like malloc and free (or operator new in C++, which in turn usually calls malloc). This layer takes care of your allocations, but does not know about memory or where it comes from.

The other layer, at OS level, does not know and does not care anything about your allocations. It only maintains a list of fixed-size memory pages that have been reserved, allocated, and accessed, and with each page information such as where it maps to.

There are many different implementations for either layer, but in general it works like this:
When you allocate memory, the allocator (the "application level part") looks whether it has a matching block somewhere in its books that it can give to you (some allocators will split a larger block in two, if need be).

If it doesn't find a suitable block, it reserves a new block (usually much larger than what you ask for) from the operating system. sbrk or mmap on Linux, or VirtualAlloc on Windows would be typical examples of functions it might use for that effect.
This does very little apart from showing intent to the operating system, and generating some page table entries.
The allocator then (logically, in its books) splits up that large area into smaller pieces according to its normal mode of operation, finds a suitable block, and returns it to you. Note that this returned memory does not necessarily even exist as phsyical memory (though most allocators write some metadata into the first few bytes of each allocated unit, so they necessarily pre-fault the pages).

In the mean time, invisibly, a background task zeroes out memory pages that were in use by some process once but have been freed. This happens all the time, on a tentative base, since sooner or later someone will ask for memory (often, that's what the idle task does).

Once you access an address in the page that contains your allocated block for the first time, you generate a fault. The page table entry of this yet non-existent page (it logically exists, just not phsyically) is replaced with a reference to a page from the pool of zero pages. In the uncommon case that there is none left, for example if huge amounts of memory are being allocated all the time, the OS swaps out a page which it believes will not be accessed any time soon, zeroes it, and returns this one.
Now the page becomes part of your working set, it corresponds to actual phsyical memory, and it accounts towards your process' quota. While your process is running, pages may be moved in and out of your working set, or may be paged out and in, as you exceed certain limits, and according to how much memory is needed and how it is accessed.

Once you call free, the allocator puts the freed area back into its books. It may tell the OS that it does not need the memory any more instead, but usually this does not happen as it is not really necessary and it is more efficient to keep around a little extra memory and reuse it. Also, it may not be easy to free the memory because usually the units that you allocate/deallocate do not directly correspond with the units the OS works with (and, in the case of sbrk they'd need to happen in the correct order, too).

When the process ends, the OS simply throws away all page table entries and adds all pages to the list of pages that the idle task will zero out. So the physical memory becomes available to the next process asking for some.

It seemed to me that OS did the most part of allocating memory. I thought malloc uses OS allocation functions and thats it, but run-time libraries and malloc particularly do a lot of plumbing as I see. This changes the way I think about programs and programming. That's great. — Vakhtang Tevdorashvili, Jun 04 '13 at 05:43

Dynamic memory allocation and memory block metadata

3 Answers3