Find out where heap memory gets corrupted

Question

I know there are already many similar questions and answers exist, but I am not able to solve my problem.

In my big application heap is getting corrupted somewhere and I am not able to locate it. I used tool like gflags also but no luck.

I tried gflags on the following sample which corrupts the heap by purpose:

char* pBuffer = new char[256];
memset(pBuffer, 0, 256 + 1);
delete[] pBuffer;

At line#2 heap is overwritten but how to find it via tools like gflags, windbg etc. May be I am not using the gflags properly.

Why `256 + 1` in `memset`, when you have allocated just `256` bytes? — T.Z, Nov 24 '15 at 10:48
In your larger application, how did you know that the heap was corrupted? What tool informed you of this? — PaulMcKenzie, Nov 24 '15 at 10:48
@T.Z. To demonstrate the sort of corruption that could occur... — StoryTeller - Unslander Monica, Nov 24 '15 at 10:49
@T.Z: I think the OP is trying to illustrate that the tools he is using (WinDbg, GFlags) do not catch this type of error. — Paul R, Nov 24 '15 at 10:49
@T.Z It is a sample program to simulate a real world problem — Anil8753, Nov 24 '15 at 10:50
If your code is reasonably portable and will run on an OS such as Linux, then you could use [valgrind](http://valgrind.org) to debug memory-related problems such as this. — Paul R, Nov 24 '15 at 10:52
@YSC: sorry - which of my above comments is not constructive ? Suggesting Linux+valgrind ? If it's just a command line program then debugging on Linux with valgrind might actually be a sensible strategy. — Paul R, Nov 24 '15 at 10:53
Can you try and use an alternative to gflags? Here is a [list of potential tools to look for memory corruption on Windows](http://stackoverflow.com/questions/413477/is-there-a-good-valgrind-substitute-for-windows). — YSC, Nov 24 '15 at 10:55
@Anil8753 `I tried gflags on the following sample which corrupts the heap by purpose:` That line of code does not guarantee that you've actually corrupted the heap. If any check were available, it would be done via "guard bytes" being changed, a strategy that the Microsoft debug runtime uses. — PaulMcKenzie, Nov 24 '15 at 10:58
Also interested how to find these on windows. I use valgrind on linux to find these things, but this does not help if the code in question cannot be compiled for linux. — Ludwig Schulze, Nov 24 '15 at 10:58
Please note that your sample does not **always** corrupt the heap, because the next heap allocation headers might be pretty far away from allocated memory. A more interesting strategy would be to allocate data, and write to decreasing memory addresses from the start of your buffer. — SirDarius, Nov 24 '15 at 10:58
@SirDarius I accept it does not corrupt the heap always, but It can. How one can find these issues. — Anil8753, Nov 24 '15 at 11:20

Jeremy Friesner · Answer 1 · 2016-02-25T21:02:30.320

If automated tools (like electric fence or valgrind) don't do the trick, and staring intently at your code to try and figure out where it might have gone wrong doesn't help, and disabling/enabling various operations (until you get a correlation between the presence of heap-corruption and what operations did or didn't execute beforehand) to narrow it doesn't seem to work, you can always try this technique, which attempts to find the corruption sooner rather than later, so as to make it easier to track down the source:

Create your own custom new and delete operators that put corruption-evident guard areas around the allocated memory regions, something like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <new>

// make this however big you feel is "big enough" so that corrupted bytes will be seen in the guard bands
static int GUARD_BAND_SIZE_BYTES = 64;

static void * MyCustomAlloc(size_t userNumBytes)
{
    // We'll allocate space for a guard-band, then space to store the user's allocation-size-value,
    // then space for the user's actual data bytes, then finally space for a second guard-band at the end.
    char * buf = (char *) malloc(GUARD_BAND_SIZE_BYTES+sizeof(userNumBytes)+userNumBytes+GUARD_BAND_SIZE_BYTES);
    if (buf)
    {
       char * w = buf;
       memset(w, 'B', GUARD_BAND_SIZE_BYTES);          w += GUARD_BAND_SIZE_BYTES;
       memcpy(w, &userNumBytes, sizeof(userNumBytes)); w += sizeof(userNumBytes);
       char * userRetVal = w;                          w += userNumBytes;
       memset(w, 'E', GUARD_BAND_SIZE_BYTES);          w += GUARD_BAND_SIZE_BYTES;
       return userRetVal;
    }
    else throw std::bad_alloc();
}

static void MyCustomDelete(void * p)
{
    if (p == NULL) return;   // since delete NULL is a safe no-op

    // Convert the user's pointer back to a pointer to the top of our header bytes
    char * internalCP = ((char *) p)-(GUARD_BAND_SIZE_BYTES+sizeof(size_t));

    char * cp = internalCP;
    for (int i=0; i<GUARD_BAND_SIZE_BYTES; i++)
    {
        if (*cp++ != 'B')
        {
            printf("CORRUPTION DETECTED at BEGIN GUARD BAND POSITION %i of allocation %p\n", i, p);
            abort();
        }
    }

    // At this point, (cp) should be pointing to the stored (userNumBytes) field
    size_t userNumBytes = *((const size_t *)cp);
    cp += sizeof(userNumBytes);  // skip past the user's data
    cp += userNumBytes;

    // At this point, (cp) should be pointing to the second guard band
    for (int i=0; i<GUARD_BAND_SIZE_BYTES; i++)
    {
        if (*cp++ != 'E')
        {
            printf("CORRUPTION DETECTED at END GUARD BAND POSITION %i of allocation %p\n", i, p);
            abort();
        }
    }

    // If we got here, no corruption was detected, so free the memory and carry on
    free(internalCP);
}

// override the global C++ new/delete operators to call our
// instrumented functions rather than their normal behavior
void * operator new(size_t s)    throw(std::bad_alloc)   {return MyCustomAlloc(s);}
void * operator new[](size_t s)  throw(std::bad_alloc)   {return MyCustomAlloc(s);}
void operator delete(void * p)   throw()                 {MyCustomDelete(p);}
void operator delete[](void * p) throw()                 {MyCustomDelete(p);}

... the above will be enough to get you Electric-Fence style functionality, in that if anything writes into either of the two 64-byte "guard bands" at the beginning or end of any new/delete memory-allocation, then when the allocation is deleted, MyCustomDelete() will notice the corruption and crash the program.

If that's not good enough (e.g. because by the time the deletion occurs, so much has happened since the corruption that it's difficult to tell what caused the corruption), you can go even further by having MyCustomAlloc() add the allocated buffer into a singleton/global doubly-linked list of allocations, and have MyCustomDelete() remove it from that same list (make sure to serialize these operations if your program is multithreaded!). The advantage of doing that is that you can then add another function called e.g. CheckForHeapCorruption() that will iterate over that linked list and check the guard-bands of every allocation in the linked list, and report if any of them have been corrupted. Then you can sprinkle calls to CheckForHeapCorruption() throughout your code, so that when heap corruption occurs it will be detected at the next call to CheckForHeapCorruption() rather than some time later on. Eventually you will find that one call to CheckForHeapCorruption() passed with flying colors, and then the next call to CheckForHeapCorruption(), just a few lines later, detected corruption, at which point you know that the corruption was caused by whatever code executed between the two calls to CheckForHeapCorruption(), and you can then study that particular code to figure out what it's doing wrong, and/or add more calls to CheckForHeapCorruption() into that code as necessary.

Repeat until the bug becomes obvious. Good luck!

score 1 · Answer 2 · answered Nov 26 '15 at 16:35

If the same variable is consistently being corrupted, data break points are a quick and simple way to find the code responsible for the change (if your IDE supports them). (Debug->New Break Point->New Data Breakpoint... in MS Visual Studio 2008). They won't help if your heap corruption is more random (but figured I'd share the simple answer in case it helps).

score 0 · Answer 3 · answered Feb 25 '16 at 20:21

There's a tool called electric fence that I think is supported also on Windows.

Essentially, what it does is hijack malloc and co to make every allocation end at page boundary and mark the next page inaccessible.

The effect is that you get a seg fault on buffer overrun.

It probably also have an option for buffer underrun.

score 0 · Answer 4 · edited May 23 '17 at 11:59

Please read this link Visual Studio - how to find source of heap corruption errors

Is there a good Valgrind substitute for Windows?

It tells technique for finding heap issues on windows.

But on the other hand you can always write (if you are writing new code) memory managers. The way to do is: use your wrapper apis which will call malloc/calloc etc.

Suppose you have api myMalloc(size_t len); then inside your function, you can try allocationg HEADER + len + FOOTER. On your header save info like size of allocation or may be more info. At the footer, add some magic number like deadbeef. And return ptr(from malloc) + HEADER from myMalloc.

When freeing it up using myfree(void *ptr), then just do ptr -HEADER, check the len, then jump at the FOOTER = ptr-HEADER + really allcated len. At this offset, you should find deadbeef, and if you dont find, then you know, its been corrupted.

Find out where heap memory gets corrupted

4 Answers4

Linked