Deallocated Heap Not Being Reclaimed?

Question

I have C++ process that ingests large blocks of data and stores them in memory. The storage array contains roughly 10 GB of data partitioned into 4MB blocks. As new data arrives it creates a new block and then deletes an old block if it is full. This process cycles through the full circular buffer once every 10 - 60 seconds. We are running on an x86_64 RH5 and RH6 and compiling with the Intel 14 compiler.

We are seeing a problem where the overall process memory usage grows over time until the OS runs out of memory and eventually the box dies. We have been looking for memory leaks and running the process through TotalView trying to determine where the memory is going and are not seeing any reported leaks.

On the heap report produced by total view we saw the 10GB of allocated memory for the stored data, but we also saw 4+ GB of "deallocated" memory. Looking through the heap display, it appeared that our heap was very fragmented. There would be a large chunk of "allocated" memory interspersed with large chunks of "deallocated" memory.

Is the "deallocated" memory memory that has been freed by my process but not reclaimed by the OS and is it reasonable to think that this may be the source of our memory "leak"?
If so, how do I get the OS to reclaim the memory?
Do we need to rework our process to reuse discarded data blocks instead of relying on the OS to do our memory management for us?

If memory consumption keeps going up that generally means that you `new` up some memory and never `delete` it. Have you tried using something like gdb? — NathanOliver, Jan 20 '16 at 17:36
We have been running the memory analysis and leak detection tools provided by TotalView and it is not reporting any un-deleted news. I also put cerr's in the contructor and destructor of our data blocks to verify we weren't losing any. — mjr, Jan 20 '16 at 17:47
What kind of C++ application are you coding? A data server one, or an HPC (numerical computation) one? How long does it runs (hours or months)? — Basile Starynkevitch, Jan 20 '16 at 17:52
I don't think there's a "best" way to avoid fragmentation, but pooling allocators can be a good idea if there are many equally-sized blocks. — molbdnilo, Jan 20 '16 at 17:54
This process ingests data, converts it to the format we want, and then stores it, acting as a giant circular buffer. It runs indefinitely and cycles through its data every 10-60 seconds depending on the configuration. — mjr, Jan 20 '16 at 17:56
@mjr: that should go into the question (so please edit it). And you could also tell more about your OS and compiler & compilation options — Basile Starynkevitch, Jan 20 '16 at 18:14
@mjr: Your questions are not generic C++ questions (if you do not have a genuine leak) but depend on (1) your OS, (2) your C++ implementation and, possibly, (3) your C implementation which might be providing the `malloc` and `free` to the C++ implementation. Without those 2 or 3 parameters, this question is incomplete and cannot be answered. — Matthieu M., Jan 20 '16 at 18:32
@BasileStarynkevitch: looks like HPC since [TotalView](http://www.roguewave.com/products-services/totalview) is built for it... I guess :( — Matthieu M., Jan 20 '16 at 18:33
@mjr: **you should edit your question to improve it** (don't comment it here)! — Basile Starynkevitch, Jan 20 '16 at 20:38
The kind of application (data server / HPC) is not relevant to the problem of our heap continuing to grow even though we delete the old blocks. — mjr, Jan 20 '16 at 21:18
@mjr: on the contrary, the kind of application does matter a lot — Basile Starynkevitch, Jan 20 '16 at 21:42

score 3 · Accepted Answer · edited May 23 '17 at 12:30

I guess (and hope for you) that you are on Linux (if porting your code to Linux is doable, consider that since Linux has good tools for such issues).

Then:

use C++11 (or C++14) and learn about move semantics, smart pointers, and rule of five.
use valgrind
use some sanitizers from your recent GCC or Clang/LLVM compiler. Read about -fsanitizer=... debugging options; you probably want -fsanitize=address at least during debugging.

The above will help you catching some remaining memory leaks. Be prepared to spend weeks on them. You might need to disable ASLR and you should learn about gdb watchpoints.

You might also consider using Boehm's conservative garbage collector. See this for using it in standard C++ containers. If you do use Boehm's GC you'll better use it nearly every where in your program ...

Genuine fragmentation may happen (even if you are sure to have avoided memory leaks, and have checked that e.g. with valgrind), in particular for long lived processes. In such cases, you might consider having your own application checkpointing facilities (which are also useful to restart a long-lived computation). If you have thought about it early enough (checkpointing should be an early architectural design decision!) you could checkpoint your state to disk once in a while (e.g. every hour) and restart a fresh process. This can be a good memory compacting strategy.

You could (but I don't necessarily recommend) writing your own memory allocator above OS virtual address space changing primitives like mmap(2) (perhaps with MAP_HUGETLB ....) & munmap; you might have your own allocator and deallocator (at least for large-sized objects, or have operator new & operator delete, etc..., in some of your classes), read about C++ allocator concept. But your standard new and delete (and malloc & free for C code, often used by C++ new & delete) is using them.

Notice that most free or delete do not invoke munmap, but simply marks the released memory as reusable by future malloc or new ...

You definitely should become more familiar with garbage collection techniques and terminology. Read the GC handbook.

See also mallinfo(3) & mallopt(3) & proc(5) (perhaps use /proc/self/maps or /proc/self/smaps & /proc/self/statm from inside your program to learn about your heap, or the pmap command). Maybe strace(1) could be useful (to understand what syscalls(2) happen)

I sympathize with your eagerness to help, however I am afraid that by rushing in answering the question your hard work might end up wasted if the OP finally clarifies the question: it might after all be due to the OS/malloc. — Matthieu M., Jan 20 '16 at 18:35
Thanks for the info on mallinfo and mallopt: "Allocating memory using mmap(2) has the significant advantage that the allocated memory blocks can always be independently released back to the system. (By contrast, the heap can be trimmed only if memory is freed at the top end.) ". It also looks like the threshold is dynamic by default so it would learn about my 4MB allocation chunks and raise the threshold. — mjr, Jan 20 '16 at 21:41
I set the M_MMAP_THRESHOLD to less than my block size and am hopefully optimistic based on a couple minutes of runs. I'll know in the morning if that solves my problems. Can you think of any caveats to setting that? — mjr, Jan 20 '16 at 22:15
With M_MMAP_THRESHOLD set to less than my block size, I now see my memory usage remain constant at my expected data store size. Thank you. — mjr, Jan 21 '16 at 12:48

Deallocated Heap Not Being Reclaimed?

1 Answers1

Linked