0

Folks - I have a task A which I perform sequentially ~100 times. Each task A has many tasks B which are processed in parallel. B stores data which is needed after all A's are completed. So the program's memory footprint grows over time. These are "long running" tasks.

I found that A was taking a long time to complete with each successive A. I implemented Server Garbage Collection and there was a dramatic improvement - A's time-to-complete was cut in half! However, A's time-to-complete was still growing with each successive A - linearly. So by the 10th A, the improvements from Server Garbage Collection were irrelevant - the full 100 A's would never complete in a reasonable time and I would need to stop the process.

My hypothesis is that the growing memory footprint is causing GC to do more work over time, slowing everything down.

  1. Do you have any other hypotheses I can test?
  2. If my hypothesis is worth exploring, what solutions can I pursue? Should I become more "hands-on" with the Garbage Collector? Maybe I should flush in-memory data to disk and suck it up when I need it?

EDIT
I forgot to mention that each B calls GC.Collect and GC.WaitForPendingFinalizers because I'm automating COM and that's the only way to ensure the COM server process is released.

Suraj
  • 35,905
  • 47
  • 139
  • 250
  • 1
    .net has a heap profiler (CLRProfiler), which you can use to find out what data are live and what holds pointers to them. I don't have experience with .net, but the heap profiler in Haskell has helped me find GC-related memory leaks. – Heatsink Mar 14 '11 at 15:19
  • I'm not concerned about a leak...the growth in the memory footprint is expected. As I said, "B stores data which is needed after all A's are completed" – Suraj Mar 14 '11 at 15:20
  • Sounds like a normal OutOfMemory-Situation due to a leak or simply too much data. Near OutOfMemory the GC tries repeatedly to free memory and fails -> CPU is busy with garbage collection. This takes some time until OutOfMemoryExceptions are thrown. – Markus Kull Mar 14 '11 at 15:25
  • I'm nowhere near outofmemory. I'm using 6gb in memory and I have almost 9gb remaining (x64). There's no memory leak either. – Suraj Mar 14 '11 at 15:41
  • SORRY....forgot to mention that I'm automating COM so I'm forced to call GC.Collect/GC.WaitForPendingFinalizers with each task B. That ensures the COM process is released. – Suraj Mar 14 '11 at 15:45
  • This remark is in msdn documentation for WaitForPendingFinalizers() : "The thread on which finalizers are run is unspecified, so there is no guarantee that this method will terminate." Maybe this is the cause of your problem. – Kipotlov Mar 14 '11 at 15:56
  • 1
    Why don't you use `Dispose` to get rid of the native resources? Calling GC.Collect that often is usually wrong. – CodesInChaos Mar 14 '11 at 16:21
  • You may need to do your own clean up and not rely on the GC. Relying on GC isn't a good idea. If server GC didn't work then you need to look over your code and optimize. Too many new ups or string manipulations will hurt perf. – Dustin Davis Mar 14 '11 at 17:11
  • Calling GC.Collect is unfortunately right in this case =) http://stackoverflow.com/questions/158706/how-to-properly-clean-up-excel-interop-objects-in-c/159419#159419 – Suraj Mar 14 '11 at 17:16
  • @SFun28 I read that post as "It's hard to write correct COM code, so I'll just use an ugly hack". I don't buy that it's the right thing. – CodesInChaos Mar 16 '11 at 10:11
  • @CodeInChaos - I think that's a harsh analysis. Tons of people automate Excel with success in the way the article prescribes. I do too. I automate tens-of-thousands of Excel processes continuously over a few days and it all works just fine. The AppDomain approach is overkill for most scenarios. – Suraj Mar 17 '11 at 04:09

2 Answers2

0

Found the issue! When each A completes I now serialize the data to disk. In that way, the memory footprint is constant across A's. Using this technique there is no longer a linear growth in time it takes to complete A. Its constant. After all A's are done I simply deserialize the data on disk. Although this is a bit time-consuming because now there's a lot of data to read, it shouldn't compare to the amount of time saved by keeping A's at constant time.

Suraj
  • 35,905
  • 47
  • 139
  • 250
0

If you're really too lazy to dispose your COM stuff cleanly, I'd use one AppDomain for the COM-Interop, and another to accumulate the data.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • it has nothing to do with disposing COM cleanly. Read any article about Excel automation. You HAVE to call GC.Collect. There's no way around it. – Suraj Mar 17 '11 at 04:06
  • Unless you do the AppDomain thing...but that's not a common pattern when dealing with Excel. – Suraj Mar 17 '11 at 04:07