6

I'm having a strange problem with the VS2012 compiler that doesn't seem to show up in GCC. The deallocation process ends up taking minutes rather than seconds. Does anyone have any input on this? Step debugging shows a noticeable hang at calls to RtlpCollectFreeBlocks(). I have this problem in both debug and release mode. I'm running Windows 7 32-bit, but I have the same problem on 64-bit 7.

#include "stdafx.h"
#include <iostream>
#include <stdint.h>
#include <cstdlib>

#define SIZE 500000

using namespace std;

typedef struct
{
    uint32_t* thing1;
}collection;

/*
 * VS2012 compiler used.
 * Scenarios: 
 *  1) Don't allocate thing1. Program runs poorly.
 *  2) Allocate thing1 but don't delete it. Program runs awesome.
 *  3) Allocate thing1 and delete it. Program runs poorly.
 * 
 * Debug or Release mode does not affect outcome. GCC's compiler is fine.
 */
int _tmain(int argc, _TCHAR* argv[])
{
    collection ** colArray = new collection*[SIZE];

    for(int i=0;i<SIZE;i++)
    {
        collection * mine = new collection;
        mine->thing1 = new uint32_t; // Allocating without freeing runs fine. Either A) don't allocate or B) allocate and delete to make it run slow.
        colArray[i] = mine;
    }

    cout<<"Done with assignment\n";

    for(int i=0;i<SIZE;i++)
    {
        delete(colArray[i]->thing1); // delete makes it run poorly.
        delete(colArray[i]);

        if(i > 0 && i%100000 == 0)
        {
            cout<<"100 thousand deleted\n";
        }
    }
    delete [] colArray;

    cout << "Done!\n";
    int x;
    cin>>x;
}
WhozCraig
  • 65,258
  • 11
  • 75
  • 141
Sean
  • 447
  • 1
  • 4
  • 15
  • It's not generally idiomatic C++ to allocate a single int on the heap - if you store it by value does that help? – Mark B Sep 11 '13 at 15:59
  • This was just a simple example. I originally had 4 uint32_t's in the struct. The idea was to show that the size of the struct doesn't seem to matter for this issue. – Sean Sep 11 '13 at 16:04
  • @Sean Using a binary search on `SIZE` is there a point at which the performance drastically improves? – Mark B Sep 11 '13 at 16:06
  • 3
    Its certainly appears to be ide hook related. Switch out to a cmd prompt and run your program from the console. No problems at all, and the memory model and debug state make no difference. – WhozCraig Sep 11 '13 at 16:20
  • my 2 cents, before VS2005, we can use single thread runtime library in VC++, after vs2005, we can only use multi-thread runtime library, which specified by /MT or /MD. We debugged the code at that time, and found out that the free() function in multi-thread runtime library is slower than that in the single thread one because of synchronization. I'm not familiar with GCC. Does the program compiled by GCC use the equivalent option to /MD or /MT in VS? – Matt Sep 11 '13 at 16:23
  • does http://stackoverflow.com/questions/3360900/visual-c-difference-between-start-with-without-debugging-in-release-mode/4375879#4375879 help? – Mike Vine Sep 11 '13 at 16:42
  • 1
    along with http://stackoverflow.com/questions/6486282/set-no-debug-heap – Mike Vine Sep 11 '13 at 16:43
  • @WhozCraig Thanks man, that seems to be the issue. Post a response and I'll set it as the answer. – Sean Sep 11 '13 at 17:11
  • Have you tried setting `_CrtSetDbgFlag(0)` in the code? Does this happen with VS2010 and earlier? What about if you have WinDbg attached instead? – Chris O Sep 12 '13 at 01:30
  • @MikeVine I wish I would have seen your links earlier. It would have saved me a ton of self-testing just to get a feel for how this worked. Alas, at least it was interesting (as well as pointless). I think the first link is qualification enough for a dupe to this question. It was for VS2010, but it still seems to apply. – WhozCraig Sep 12 '13 at 03:54

1 Answers1

8

The performance hit you're seeing is coming from Windows debug heap functionality, and its a little stealthy in how it enables itself, even in release builds.

I took the liberty of build a 64bit debug image of a simpler program and came to discover this:

  • msvcr110d.dll!_CrtIsValidHeapPointer(const void * pUserData=0x0000000001a8b540)
  • msvcr110d.dll!_free_dbg_nolock(void * pUserData=0x0000000001a8b540, int nBlockUse=1)
  • msvcr110d.dll!_free_dbg(void * pUserData=0x0000000001a8b540, int nBlockUse=1)
  • msvcr110d.dll!operator delete(void * pUserData=0x0000000001a8b540)

Of particular interest to me was the body of msvcr110d.dll!_CrtIsValidHeapPointer which it turns out is this:

if (!pUserData)
    return FALSE;

// Note: all this does is checks for null    
if (!_CrtIsValidPointer(pHdr(pUserData), sizeof(_CrtMemBlockHeader), FALSE))
    return FALSE;

// but this is e-x-p-e-n-s-i-v-e
return HeapValidate( _crtheap, 0, pHdr(pUserData) );

That HeapValidate() call is brutal.

Ok, maybe I would expect this in a debug build. but certainly not release. As it turns out, that gets better, but look at the call stack:

  • ntdll.dll!RtlDebugFreeHeap()
  • ntdll.dll!string "Enabling heap debug options\n"()
  • ntdll.dll!RtlFreeHeap()
  • kernel32.dll!HeapFree()
  • msvcr110.dll!free(void * pBlock)

This is interesting, because when I ran this first, then attach to the running process with the IDE (or WinDbg) without allowing it to control the execution startup environment, this callstack stops at ntdll.dll!RtlFreeHeap(). In other words, running outside the IDE RtlDebugFreeHeap is not invoked. But why??

I thought to myself, somehow the debugger is flipping a switch to enable heap debugging. After doing some digging I came to find that "switch" is the debugger itself. Windows uses the special debug heap functions (RtlDebugAllocHeap and RtlDebugFreeHeap) if the process being run is spawned by a debugger. This man-page from MSDN on WinDbg eludes to this, along with other interesting tidbits about debugging under Windows:

from Debugging a User-Mode Process Using WinDbg

Processes that the debugger creates (also known as spawned processes) behave slightly differently than processes that the debugger does not create.

Instead of using the standard heap API, processes that the debugger creates use a special debug heap. You can force a spawned process to use the standard heap instead of the debug heap by using the _NO_DEBUG_HEAP environment variable or the -hd command-line option.

Now we're getting somewhere. To test this out I simply dropped a sleep() with an appropriate amount of time for me to attach the debugger rather than spawn the process with it, then let it run on its merry way. Sure enough, as mentioned previously, it sailed full-speed-ahead.

Based on the content of that article, I have taken liberty to update my Release-mode builds to define _NO_DEBUG_HEAP=1 in their execution environment settings of my project files. I'm obviously still interested in granular heap-activity in debug builds, so those configurations stayed as-is. After doing this, the overall speed of my release builds running under VS2012 (and VS2010) were substantially faster, and I invite you to try as well.

WhozCraig
  • 65,258
  • 11
  • 75
  • 141