7

I'm tasked with improving a piece of code that generates massive reports, in any way I see fit.

There are about 10 identical reports generated (for each 'section' of the database) , and the code for them is similar to this:

GeneratePurchaseReport(Country.France, ProductType.Chair);
GC.Collect();
GeneratePurchaseReport(Country.France, ProductType.Table);
GC.Collect();
GeneratePurchaseReport(Country.Italy, ProductType.Chair);
GC.Collect();
GeneratePurchaseReport(Country.Italy, ProductType.Table);
GC.Collect();

If I remove those GC.Collect() calls, the reporting service crashes with OutOfMemoryException.

The bulk of the memory is kept in a massive List<T> which is filled inside GeneratePurchaseReport and is no longer of use as soon as it exits - which is why a full GC collection will reclaim the memory.

My question is two-fold:

  1. Why doesn't the GC do this on its own? As soon as it's running out of memory on the second GeneratePurchaseReport it should do a full collection before crashing and burning, shouldn't it?
  2. Is there a memory limit which I can raise somehow? I don't mind at all if data is swapped to disk, but the .net process is using far less memory than even the available 2.5GB of RAM! I'd expect it to only crash if it's run out of address space but on a 64-bit machine I doubt that happens so soon.
Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
configurator
  • 40,828
  • 14
  • 81
  • 115
  • 1
    Well, when I looked at just the title, I thought you need some sort of stress-application that attempts burn memory for testing purposes. – Christian.K May 19 '11 at 04:51
  • What is ***GeneratePurchaseReport*** ? Local Report RDLC or Remote RDL in SSRS server ? – Kiquenet Oct 31 '15 at 17:55
  • **GcHelper** https://github.com/mcctomsk/MccTomskHelpers/blob/560a079172468fd44ce952fbfcd676d297602442/Core/GcHelper.cs Notes: http://stackoverflow.com/questions/10016541/garbage-collection-not-happening-even-when-needed and _The reason why GC.Collect is called twice: http://stackoverflow.com/questions/3829928/under-what-circumstances-we-need-to-call-gc-collect-twice_ – Kiquenet Nov 02 '15 at 11:33

5 Answers5

5

Read up on the Large Object Heap.

I think what's happening is that the final document for individual reports is built and appended to over time, such that at each append operation a new document is created and the old is discarded (that probably happens behind the scenes). This document is (eventually) larger than the 85,000 byte threshold for storage on the Large Object Heap.

In this scenario, you're actually not using that much physical memory — it's still available for other processes. What you are using is address space that is available to your program. Every process in Windows has it's own (typically) 2GB address space available. Over time as you allocate new copies of your growing report document, you leave behind numerous holes in the LOH when the prior copy is collected. The memory freed by prior objects is not actually used anymore and is available for other processes, but the address space is still lost; it's fragmented and needs to be compacted. Eventually this address space fills up and you get an OutOfMemory exception.

The evidence suggests that calls to GC.Collect() allow for some compaction of the LOH, but it's not a perfect solution. Just about everything else I've read on the subject indicates that GC.Collect() is not supposed to compact the LOH at all, but I've seen several anecdotal reports (some here on Stack Overflow) where calling GC.Collect() was in fact able to avert OutOfMemory Exceptions from LOH fragmentation.

A "better" solution (in terms of being sure you won't ever run out of memory -- using GC.Collect() to compact the LOH just isn't reliable) is to splinter your report into units smaller than 85000 bytes, and write them all into a single buffer at the end, or using a data structure that doesn't throw away your prior work as it grows. Unfortunately, this is likely to be a lot more code.

One relatively simple option here is to allocate a buffer for a MemoryStream object that is bigger than your largest report, and then write into the MemoryStream as you build the report. This way you never leave fragments. If this is just written to disk you might even go right to a FileStream (perhaps via TextWriter, to make it easy to change later). It this option solves your problem, I'd like to hear about it in a comment to this answer.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Everything I've read suggests that the LOH is never compacted? Objects are garbage collected in the LOH (during Gen 2 collection) (and presumably free adjacent regions concatenated), but not compacted, afaik. – Mitch Wheat May 19 '11 at 02:33
  • 1
    @Mitch - it's not compacted afaik as well, but I've seen several instances now where GC.Collect() was able to correct apparent LOH fragmentation, and so I'm starting to wonder if an update or patch has rendered these articles obsolete – Joel Coehoorn May 19 '11 at 02:48
  • I'd be surprised (and disappointed) if such a far-reaching change had been made without telling the wider community. – Mitch Wheat May 19 '11 at 02:51
  • Call `GC.Collect()` twice: http://stackoverflow.com/questions/3829928/under-what-circumstances-we-need-to-call-gc-collect-twice – Kiquenet Nov 02 '15 at 19:24
3

We would need to see your code to be sure.

Failing that:

  • Are you pre-sizing the List with an expected number of items?

  • Can you pre-allocate and use an array instead of a list? (boxing/unboxing might then be an additional cost)

  • Even on a 64 bit machine, the largest size a single CLR object can be is 2GB

  • pre-allocate a memorystream to hold the entire report, and write to that.

Of interest?:

I would suggest using a memory profiler such as memprofiler, or Redgate (both have free trials) to see where the problem actually lies).

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
1

The reason is probably Large Object Heap and any objects which use native heap internally, e.g. Bitmap class. Large object heap is also a traditional C heap, which fragments. Fragmentation is one aspect of this issue.

But I think it also has something to do with how GC determine when to collect. It works perfectly for the normal generational heaps but for allocated memory in other heaps, specially for memory in native heaps, it may not have enough information to make a perfect decision. And LOH is treated as generation 2, which means it has the least chance to be collected.

So in your case, I think manual forcing collect is a reasonable solution. But yes, it is not perfect.

PS: I'd like add a few more info to Joel's good explanation. The threshold for LOH is 85000 bytes for normal objects, but for double array it is 8000 bytes.

Dudu
  • 1,264
  • 13
  • 14
0

Are you using Microsoft SQL Server Reporting Services?

If so: http://support.microsoft.com/kb/909678

Steve Wellens
  • 20,506
  • 2
  • 28
  • 69
-3

First of all, Garbage collection runs on 1 assumption: the capacity of the heap is unlimited. Garbage collectors does not collect object when it's run out of memory, but it collect object if there is any object that are no longer used by the program. Depends on the GC algorithms, I believe that GC mark the memory used for the report as still being used. Therefore, it can not just simply remove it.

The reason why the GC does not do its job when the consecutive GeneratePurchaseReport() called is because GC is not running all the time. It employs certain algorithm to predict how often garbage should be collected based on the past behavior. And in your case, it certainly does not predict that garbage need to be collected at 4 consecutive lines.

awong
  • 1
  • 2
  • 1
    If the capacity of the heap were unlimited, GC wouldn't be necessary. I'd say the very existence of GC relies on the assumption that the heap *is* limited. – cHao May 19 '11 at 17:16