1

I'm working on a program using a large number of threads, each thread allocating in heap a few megabytes of memory. When these threads end, a large part of the RAM is kept by the program.

Here is an example of code, allocating and freeing 1 MB in 500 threads, which shows this problem:

#include <future>
#include <iostream>
#include <vector>

// filling a 1 MB array with 0
void task() {
    const size_t S = 1000000;
    int * tab = new int[S];
    std::fill(tab, tab + S, 0);
    delete[] tab;
}

int main() {
    std::vector<std::future<void>> threads;
    const size_t N = 500;

    std::this_thread::sleep_for(std::chrono::seconds(5));
    std::cout << "Starting threads" << std::endl;

    for (size_t i = 0 ; i < N ; ++i) {
        threads.push_back(std::async(std::launch::async, [=]() { return task(); }));
    }

    for (size_t i = 0 ; i < N ; ++i) {
        threads[i].get();
    }

    std::cout << "Threads ended" << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(25));

    return 0;
}

On my computer, this code, simply built with g++ -o exe main.cpp -lpthread, uses 1976 kB before the message "Starting threads", and 419 MB after the message "Threads ended". These values are just examples: when I run the program multiple times, I can get different values.

I have tried valgrind / memcheck, but it doesn't detect any leak.

I have noticed that locking the "std::fill" operation with a mutex seems to solve this issue (or largely reduce it), but I don't think this is a race condition problem, as there is no shared memory here. I guess the mutex simply creates an execution order between the threads which avoid (or reduce) the conditions in which the memory leaks.

I am using Ubuntu 18.04, with GCC 7.4.0.

Thank you for your help.

Aurélien

Aurélien
  • 28
  • 4

3 Answers3

3

There is no memory leak at all, as Valgrind/memcheck already confirmed to you.

[...] uses 1976 kB before the message "Starting threads", and 419 MB after the message "Threads ended".

Two things:

  • At the beginning, your vector is empty.
  • At the end, your vector contains 500 std::future<void> objects.

This is why your memory consumption increased. Everything has a cost, you cannot store something in memory for free.
Consequently, your program behaves as expected.


By the way, you don't need to use a lambda, you could pass your function directly :)

Edit: For completeness, you should read the @Marek R's answer which mention another side of the topic which is that memory released by the program (threads, dynamically allocated, ...) may not be immediately returned to the OS.


Edit2:

Concerning your point about the reduced memory consumption when you use a mutex. The thing is that the mutex forces all of your threads to be executed sequentially (one at a time).

Knowing this, I guess the compiler may be able to optimize it by using only one thread and reuse it 500 times.
Since creating a thread has a cost (any thread copies the stack for example), creating one thread instead of 500 can significantly reduce your memory consumption.

Fareanor
  • 5,900
  • 2
  • 11
  • 37
  • Thank you for your answer, but I tried to clear or to completely delete the vector of `std::future` and it doesn't change the result. Moreover, it doesn't explain why the problem disappear when I add a mutex (I still have 500 `std::future` in my vector in this case). – Aurélien Feb 07 '20 at 12:41
  • @Aurélien This is normal, because clearing the vector does not release the allocated space. If you clear the vector, the effective size becomes zero, but in behind, the vector keeps it's available space (its internal allocated buffer). [See this](https://godbolt.org/z/rxGxWt) The vector capacity is still `512` even after clearing it. This is how `std::vector` works. – Fareanor Feb 07 '20 at 12:45
  • That's true, I tried this to see if any call to a destructor would help. But what about the fact that there is far less used memory when I lock the task with a mutex ? – Aurélien Feb 07 '20 at 12:52
  • @Aurélien I have edited my answer to answer about the mutex too. Other thing, don't call the destructor for an object leaving on the stack, because it will get destroyed twice (and so lead to undefined behaviour). Instead, you can use `std::vector::shrink_to_fit()` after the clearing in order to remove the unused capacity. – Fareanor Feb 07 '20 at 13:12
2

The whole mystery is hidden in standard library which is responsible for managing memory. Mutithreading has impact on memory consumption only because each tread needs quite a lot of memory (for some reasons most beginners do not remember about that).

When you call delete (or free in C) it doesn't mean that memory returns to the system. It only means that standard library marks this piece of memory as not needed any more.

Now since requesting or releasing memory from/to the system is quite expensive and can be done in huge chunks (page size is 8-32 kB depending on hardware), standard library tries optimize that and doesn't return all memory back to the system immediately. It assumes that program may need this memory again soon.

So memory consumed by process is not a good number indicating of memory leak. Only when process running for a longer time, stays in the same state and continuously gains memory, then you can suspect that program leaks memory.
In all other cases you should relay on tools like valgrind (I recommend to use address sanitizer).

There is also other optimization which has impact on what you are seeing. Spawning thread is costly, so when thread completes its job, it is not destroyed completely. It is kept in a "thread pool" for future reuse.

Marek R
  • 32,568
  • 6
  • 55
  • 140
  • 1
    New/delete typically call malloc/free (at least for libstd++ and libc++). malloc is based on brk/sbrk/mmap https://stackoverflow.com/questions/5716100/what-happens-in-the-kernel-during-malloc. AFAIK freed memory can only returned to the system if it is at the end of the `brk`. Otherwise it will be made available for subsequent calls to `malloc`. – Paul Floyd Feb 07 '20 at 09:48
1

I will assume you don't have 500 cores, so some of the threads will not run at the same time, some of the threads will finish before the last starts which is why you don't get to use

S * sizeof(int) * n = 1000000 * 4 * 500 = 2000000000 (~2GB)

what happen is that you at most allocate ~419 MB, the freed memory from the first are then reused for the last threads.

And the program doesn't return its max used memory to the OS before its quit.

Surt
  • 15,501
  • 3
  • 23
  • 39