4

I have a .Net application (Windows Service) that consumes to much unmanaged memory after some time running until it crashes by an OutOfMemoryException. More info in this question (deleted; 10k users only).

I've managed to create a Supervisor program to scan that application's resource consumption, take memory regular memory snapshots with VMMap, and also sets a breakpoint at the VirtualAlloc() function by using the following command (formatted for readability):

bp KERNELBASE!VirtualAlloc ".if (@rdx>=0x2FAF080) {
    .printf \"============Allocating %lu bytes  ================\n\", @rdx; 
    kb 8; !clrstack; gc
} .else{gc}"

But the WinDBG output shows some weird values and I'm not able to track the same allocations shown by VMMap, so I think RDX (source) is one wrong register on the conditional breakpoint.

I need to set a correct breakpoint to track that unmanaged memory allocation and the stacktrace and finally find the guilty code.

UPDATE: This an example output of the breakpoint with the native stack. I consider the bytes shown here inaccurate because VMMap does not show any allocation of that size (3.6GB). One curious thing is that, that byte value is shown on the second to last stack frame, as argument to clr!CExecutionEngine::ClrVirtualAlloc (see the d8040000 value).

============Allocating 3624140800 bytes  ================ 
RetAddr           : Args to Child                                                           : Call Site
00007ffe`5844395a : 00000001`111bf000 00000000`d2cb8000 00000000`00a229da 00000000`5ff17000 : KERNELBASE!VirtualAlloc
00007ffe`584adf14 : 00000000`00000004 00000000`00000000 00000000`d8040000 00000051`000fcbf0 : clr!CExecutionEngine::ClrVirtualAlloc+0x4a
00007ffe`589da6c7 : 00000000`00000000 00000000`00100000 00000051`80490000 00000051`000fcbf0 : clr!ClrVirtualAlloc+0x3c
00007ffe`589da270 : 00000000`00000000 00000051`000fcdc8 00000051`80490000 00000000`0006e120 : clr!WKS::gc_heap::grow_brick_card_tables+0x177
00007ffe`589d9ee4 : 00000000`08000000 00000000`00000023 00000000`00000000 ffffffff`fffffff8 : clr!WKS::gc_heap::get_segment+0x140
00007ffe`589dae9e : 00000000`08000000 00000000`00000000 00000051`000fcde0 00000051`000fcdb0 : clr!WKS::gc_heap::get_large_segment+0x204
00007ffe`58829226 : 00000000`0000000c 00000000`00000000 00000000`00000000 00000000`00000000 : clr!WKS::gc_heap::loh_get_new_seg+0x5e
00007ffe`585313b1 : 0000fffc`00000003 00000000`00000003 00000000`00000003 00000000`0006e138 : clr!WKS::gc_heap::allocate_large+0x2f8156
Community
  • 1
  • 1
  • What makes you think VirtualAlloc() is an unmanaged allocation of memory? – Thomas Weller Jan 09 '17 at 22:14
  • 1
    The size is the second parameter to [VirtualAlloc](https://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx), so RDX seems right. What is a "weird" value? Note that there's also [VirtualAllocEx](https://msdn.microsoft.com/en-us/library/windows/desktop/aa366890(v=vs.85).aspx) – Thomas Weller Jan 09 '17 at 22:21
  • You could try ETW VirtualAlloc tracing: https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-154-Memory-Footprint-and-Leaks See the video at 16:57. That should be much easier to analyze than Windbg stacks. – Alois Kraus Jan 09 '17 at 22:54
  • @ThomasWeller, the total commited memory (!address -sumary) it's much bigger than the sum of GC Heap (!eeheap -gc) and unmanaged heap (!heap -s), so yes, not managed, but unmanaged either. Also I managed to call kernel32!virtualalloc directly from C# and it produces the same VMmap output (private data gets bigger, not GC). My problem is identical to [this one](http://kate-butenko.blogspot.com.br/2015/09/live-debugging-with-windbg-and-vmmap.html). A "weird" value is like "3352074344". I should put a breakpoint in "VirtualAllocEx" too. – Vinícius Oliveira Jan 10 '17 at 01:37
  • @AloisKraus, I've tried those ETW event capturing get huge in small amount of time. I've tried to use the xperf (also wpr), but logging VirtualAllocs consumed 500MB in the final etl file for just 1 minute of tracing. Tried to filter by process but it seems that you cannot filter ETW by process. The problem happens after 1 or 2 hours with the application running. But It could be my final shot at least. – Vinícius Oliveira Jan 10 '17 at 01:53
  • I'm not sure whether reserved memory counts towards the .NET heap. – Thomas Weller Jan 10 '17 at 07:38
  • 1
    Maybe you can also try [Rohitab API Monitor](http://www.rohitab.com/apimonitor). It might not be able to resolve the .NET call stack, though. – Thomas Weller Jan 10 '17 at 07:41
  • @ThomasWeller, just tried Rohitab API Monitor, quite useful indeed, but as you pointed out, it cannot show the CLR callstack, just native. I'm updating the question with one WinDBG breakpoint output. – Vinícius Oliveira Jan 10 '17 at 16:51
  • @Vinicius 500MB/min looks a bit excessive. If you start tracing with wpr -start VirtualAllocation and stop it with wpr -stop c:\temp\alloc.etl you should be even on a busy machine around at 1 GB/ 10 minutes. To get the leak in question you can define a performance counter trigger which will execute the stop command when the process private bytes exceeds a threshold which should give you a clue what goes on when a big jump in memory consumption happens. – Alois Kraus Jan 10 '17 at 17:50
  • @AloisKraus, I could embed this collect in the Supervisor indeed. Thanks for the suggestion. Since the main subject of the topic seens closed (looks like the breakpoint is correct at least), I'll close the question. Thanks guys. – Vinícius Oliveira Jan 11 '17 at 00:50
  • Use PerfView or Windows Performance Analyzer. It would get you the allocations and the stacks instead of playing a conditional breakpoint game with windbg it would be more helpful. – Addy Jan 16 '17 at 17:39

1 Answers1

0

After @ThomasWeller comment, I guess the breakpoint is indeed correct.

About the memory problem, I've made contact with Microsoft support and they do find a chunk of memory, but was all zeros! Still, I haven't found the specific cause for that behavior.

Since the main subject of this question was about the breakpoint accuracy, I'm now closing this topic.