I did some experiments, read a chapter of The Linux Programming Interface and get an satisfying answer for myself.
First , the conclusion I have is:
- Library call
malloc
uses system calls brk
and mmap
under the hood when allocating memory.
- As @John Zwinck describs, a linux process would choose to use
brk
or mmap
allocating mem depending on how much you request.
- If allocating by
brk
, the process is probably not returning the memory to the OS before it terminates (sometimes it does). If by mmap
, for my simple test the process returns the mem to OS before it terminates.
Experiment code (examine memory stats in htop
at the same time):
code sample 1
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#define BUFSIZE 1073741824 //1GiB
// run `ulimit -s unlimited` first
int main(){
printf("start\n");
printf("%lu \n", sizeof(uint32_t));
uint32_t* p_arr[BUFSIZE / 4];
sleep(10);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
uint32_t* p = (uint32_t*)malloc(sizeof(uint32_t));
if (p == NULL){
printf("alloc failed\n");
exit(1);
}
p_arr[i] = p;
}
printf("alloc done\n");
for(size_t i = 0; i < (BUFSIZE / 4); i++){
free(p_arr[i]);
}
printf("free done\n");
sleep(20);
printf("exit\n");
}
When it comes to "free done\n"
, and sleep()
, you can see that the program still takes up the memory and doesn't return to the OS. And strace ./a.out
showing brk
gets called many times.
Note:
I am looping malloc
to allocate memory. I expected it to take up only 1GiB ram but in fact it takes up 8GiB ram in total. malloc
adds some extra bytes for bookeeping or whatever else. One should never allocate 1GiB in this way, in a loop like this.
code sample 2:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#define BUFSIZE 1073741824 //1GiB
int main(){
printf("start\n");
printf("%lu \n", sizeof(uint32_t));
uint32_t* p_arr[BUFSIZE / 4];
sleep(3);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
uint32_t* p = (uint32_t*)malloc(sizeof(uint32_t));
if (p == NULL){
printf("alloc failed\n");
exit(1);
}
p_arr[i] = p;
}
printf("%p\n", p_arr[0]);
printf("alloc done\n");
for(size_t i = 0; i < (BUFSIZE / 4); i++){
free(p_arr[i]);
}
printf("free done\n");
printf("allocate again\n");
sleep(10);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
uint32_t* p = malloc(sizeof(uint32_t));
if (p == NULL){
PFATAL("alloc failed\n");
}
p_arr[i] = p;
}
printf("allocate again done\n");
sleep(10);
for(size_t i = 0; i < (BUFSIZE / 4); i++){
free(p_arr[i]);
}
printf("%p\n", p_arr[0]);
sleep(3);
printf("exit\n");
}
This one is similar to sample 1, but it allocate again after free
. The scecond allocation doesn't increase memory usage, it uses the freed yet not returned mem again.
code sample 3:
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#define MAX_ALLOCS 1000000
int main(int argc, char* argv[]){
int freeStep, freeMin, freeMax, blockSize, numAllocs, j;
char* ptr[MAX_ALLOCS];
printf("\n");
numAllocs = atoi(argv[1]);
blockSize = atoi(argv[2]);
freeStep = (argc > 3) ? atoi(argv[3]) : 1;
freeMin = (argc > 4) ? atoi(argv[4]) : 1;
freeMax = (argc > 5) ? atoi(argv[5]) : numAllocs;
assert(freeMax <= numAllocs);
printf("Initial program break: %10p\n", sbrk(0));
printf("Allocating %d*%d bytes\n", numAllocs, blockSize);
for(j = 0; j < numAllocs; j++){
ptr[j] = malloc(blockSize);
if(ptr[j] == NULL){
perror("malloc return NULL");
exit(EXIT_FAILURE);
}
}
printf("Program break is now: %10p\n", sbrk(0));
printf("Freeing blocks from %d to %d in steps of %d\n", freeMin, freeMax, freeStep);
for(j = freeMin - 1; j < freeMax; j += freeStep){
free(ptr[j]);
}
printf("After free(), program break is : %10p\n", sbrk(0));
printf("\n");
exit(EXIT_SUCCESS);
}
This one takes from The Linux Programming Interface and I simplifiy a bit.
Chapter 7:
The first two command-line arguments specify the number and size of
blocks to allocate. The third command-line argument specifies the loop
step unit to be used when freeing memory blocks. If we specify 1 here
(which is also the default if this argument is omitted), then the
program frees every memory block; if 2, then every second allocated
block; and so on. The fourth and fifth command-line arguments specify
the range of blocks that we wish to free. If these arguments are
omitted, then all allocated blocks (in steps given by the third
command-line argument) are freed.
Try run with:
./free_and_sbrk 1000 10240 2
./free_and_sbrk 1000 10240 1 1 999
./free_and_sbrk 1000 10240 1 500 1000
you will see only for the last example, the program break decreases, aka, the process returns some blocks of mem to OS (if I understand correctly).
This sample code is evidence of
"If allocating by brk
, the process is probably not returning the memory to the OS before it terminates (sometimes it does)."
At last, quotes some useful paragraph from the book. I suggest reading Chapter 7 (section 7.1) of TLPI, very helpful.
In general, free()
doesn’t lower the program break, but instead adds
the block of memory to a list of free blocks that are recycled by
future calls to malloc()
. This is done for several reasons:
- The block of memory being freed is typically somewhere in the middle of
the heap, rather than at the end, so that lowering the program break
is not possible.
- It minimizes the number of
sbrk()
calls that the
program must perform. (As noted in Section 3.1, system calls have a
small but significant overhead.)
- In many cases, lowering the break
would not help programs that allocate large amounts of memory, since
they typically tend to hold on to allocated memory or repeatedly
release and reallocate memory, rather than release it all and then
continue to run for an extended period of time.
What is program break (also from the book):

Also: https://www.wikiwand.com/en/Data_segment