If automated tools (like electric fence or valgrind) don't do the trick, and staring intently at your code to try and figure out where it might have gone wrong doesn't help, and disabling/enabling various operations (until you get a correlation between the presence of heap-corruption and what operations did or didn't execute beforehand) to narrow it doesn't seem to work, you can always try this technique, which attempts to find the corruption sooner rather than later, so as to make it easier to track down the source:
Create your own custom new and delete operators that put corruption-evident guard areas around the allocated memory regions, something like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <new>
// make this however big you feel is "big enough" so that corrupted bytes will be seen in the guard bands
static int GUARD_BAND_SIZE_BYTES = 64;
static void * MyCustomAlloc(size_t userNumBytes)
{
// We'll allocate space for a guard-band, then space to store the user's allocation-size-value,
// then space for the user's actual data bytes, then finally space for a second guard-band at the end.
char * buf = (char *) malloc(GUARD_BAND_SIZE_BYTES+sizeof(userNumBytes)+userNumBytes+GUARD_BAND_SIZE_BYTES);
if (buf)
{
char * w = buf;
memset(w, 'B', GUARD_BAND_SIZE_BYTES); w += GUARD_BAND_SIZE_BYTES;
memcpy(w, &userNumBytes, sizeof(userNumBytes)); w += sizeof(userNumBytes);
char * userRetVal = w; w += userNumBytes;
memset(w, 'E', GUARD_BAND_SIZE_BYTES); w += GUARD_BAND_SIZE_BYTES;
return userRetVal;
}
else throw std::bad_alloc();
}
static void MyCustomDelete(void * p)
{
if (p == NULL) return; // since delete NULL is a safe no-op
// Convert the user's pointer back to a pointer to the top of our header bytes
char * internalCP = ((char *) p)-(GUARD_BAND_SIZE_BYTES+sizeof(size_t));
char * cp = internalCP;
for (int i=0; i<GUARD_BAND_SIZE_BYTES; i++)
{
if (*cp++ != 'B')
{
printf("CORRUPTION DETECTED at BEGIN GUARD BAND POSITION %i of allocation %p\n", i, p);
abort();
}
}
// At this point, (cp) should be pointing to the stored (userNumBytes) field
size_t userNumBytes = *((const size_t *)cp);
cp += sizeof(userNumBytes); // skip past the user's data
cp += userNumBytes;
// At this point, (cp) should be pointing to the second guard band
for (int i=0; i<GUARD_BAND_SIZE_BYTES; i++)
{
if (*cp++ != 'E')
{
printf("CORRUPTION DETECTED at END GUARD BAND POSITION %i of allocation %p\n", i, p);
abort();
}
}
// If we got here, no corruption was detected, so free the memory and carry on
free(internalCP);
}
// override the global C++ new/delete operators to call our
// instrumented functions rather than their normal behavior
void * operator new(size_t s) throw(std::bad_alloc) {return MyCustomAlloc(s);}
void * operator new[](size_t s) throw(std::bad_alloc) {return MyCustomAlloc(s);}
void operator delete(void * p) throw() {MyCustomDelete(p);}
void operator delete[](void * p) throw() {MyCustomDelete(p);}
... the above will be enough to get you Electric-Fence style functionality, in that if anything writes into either of the two 64-byte "guard bands" at the beginning or end of any new/delete memory-allocation, then when the allocation is deleted, MyCustomDelete() will notice the corruption and crash the program.
If that's not good enough (e.g. because by the time the deletion occurs, so much has happened since the corruption that it's difficult to tell what caused the corruption), you can go even further by having MyCustomAlloc() add the allocated buffer into a singleton/global doubly-linked list of allocations, and have MyCustomDelete() remove it from that same list (make sure to serialize these operations if your program is multithreaded!). The advantage of doing that is that you can then add another function called e.g. CheckForHeapCorruption() that will iterate over that linked list and check the guard-bands of every allocation in the linked list, and report if any of them have been corrupted. Then you can sprinkle calls to CheckForHeapCorruption() throughout your code, so that when heap corruption occurs it will be detected at the next call to CheckForHeapCorruption() rather than some time later on. Eventually you will find that one call to CheckForHeapCorruption() passed with flying colors, and then the next call to CheckForHeapCorruption(), just a few lines later, detected corruption, at which point you know that the corruption was caused by whatever code executed between the two calls to CheckForHeapCorruption(), and you can then study that particular code to figure out what it's doing wrong, and/or add more calls to CheckForHeapCorruption() into that code as necessary.
Repeat until the bug becomes obvious. Good luck!