2

Situation
I have created a Monster. The Applikentstein is a RAM Killer and I still cannot have a crystal clear grasp of how and why is it so rogue by design appearently. Basically, it is a computing application that launches test combinations for N Scenarios. Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB. Data is first put into buffered queues while it is being read, (~100MB), and dequeued on the flow while it is processed. Chunks of data are tracked into arrays during processing. As of threading, I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue.

Below is what happens if I launch too many threads ("too many" depending on the System RAM)
The problem is that the RAM usage slope does not flatten at some point it just grows on and on, and then drops from time to time (A few Threads finishing, GC doing its housekeeping), but as soon as new Threads are launched it bounces back, higher and higher as time goes on (probably leaking here)

enter image description here

Through time, I have asked this question, that one, and this other one. Now is the time where I feel, the more I learn about this, the less I REALLY understand. (Sigh)

Questions
Now, I would like to have a clean explanations/resources on how exactly is the memory managed under .Net.
I know there is a lot of litterature/Blog articles on this wide topic. But the subject is so large I really do not know where to start without losing myself (and my company's precious time). I want to keep focused and objective-oriented.

Second, The "tilt" came when I tried to simulate/reproduce the steep slope above (created "naturally" by the RAM'osaurus Monster) in Lab conditions, in order to isolate this behavior and find a cure to it (i.e, monitoring memory used by the process and limiting it to X% of total memory, by stopping any new thread creation).
Why was it so hard to reproduce in Lab conditions (with some memory consuming multithreaded loop) what my rogue program seems to be doing so easily after all (putting the RAM down to its knees in the Task Manager) ? The GC seemed to clean up stuff regularly in the prototype, while my application seems to keep a lot of referenced stuff into RAM. How come ?

A Third puzzling thing is on the way caching works : Why is firefox using up to 1 Gb Memory and using the cache, while my App just blows the RAM ? Would it be possible that the data is used heavily/frequently by the running threads so it never gets cached ?

Tracks and Leads
I already used the VS profiler and identified a few bottlenecks (Large Array Sorting), used GCRoot, and WinDBG Console. Each time I sniped further leaks and fixed some of them (more or less, like lacking event unsubscriptions and so...)

I know my understanding of some basic notions about memory is still poor, that's a fact. That's why I am asking for a lead here. I am prepared to read sensible resources and learn more, and follow a few enlightened advice.

Community
  • 1
  • 1
Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
  • How many threads are you creating? Each thread uses 1MB stack space. Also are watching the Large Object Heap counters? Perhaps you have a leak and/or fragmentation there. – Chris O Jan 31 '12 at 00:42
  • @ChrisO it depends on the available RAM. N is usually ranging from 10 to 50 – Mehdi LAMRANI Jan 31 '12 at 01:08
  • 1
    You are using a lot of RAM because you have so much of it. You got 5 *gigabytes* not being used at all, pretty wasteful :) – Hans Passant Jan 31 '12 at 01:30
  • @HansPassant. Hi Hans ! (Are you paid by SOF to haunt the forum :-P) Well yeah, this is an HPC, actually. I am lauching heavy computing on it and the fact that the RAM get saturated stops me from using it as efficiently as I would like. And 30 mn after this snapshot the RAM is so full it hangs out. – Mehdi LAMRANI Feb 01 '12 at 12:34
  • Next time wait that 30 minutes before making the screenshot :P – Hans Passant Feb 01 '12 at 12:53

1 Answers1

2

Not so much an answer as an observation:

There are so many different things that could impact this it's not funny. The system you've described sounds like there are hundreds of different things you need to check / watch.

First and foremost amongst them is going to be tracking down every single memory leak you have. Which, you've started, but I suggest you just need to keep going.

One note regarding firefox and "the cache". You're talking about the windows paging file; windows controls this. It has it's own rules with regards to when data is paged to disk in order to free up RAM. I suggest this is a red herring and you don't want to go down that path. One reason data gets paged to disk is lack of use. With firefox and other apps this is pretty common; from your description your software doesn't behave like this.

I'd look into how tight your loops are and whether you even given the .net runtime an opportunity to perform cleanup. Looks like it is forcing the issue on usage, so this is probably ok.

Either way, I don't understand exactly what the problem is. You mention that the graph is the result of launching too many threads.. Okay, that's expected. Every thread you launch is going to take resources; launching large numbers of them will by nature use large amounts of resources... At some point the system will not be able to launch more and it will cascade into failure.

Some issues impacting your lab vs production systems: Different patch levels windows/.net/etc, different operating systems (2008 vs 2008 R2 for example is a huge differnce), different CPU and/or memory configurations. There is really no telling with what you've given.

In order to reproduce an event you need to have exactly matching hardware and software. Then you'll need identical inputs. If any of those are off then you might be able to get close. The first thing to do would be to ensure all of the loaded and inputs are identical. Those are easily controlled. Then start playing.


Slight update. I read some of the questions you listed.

In order to determine the optimum number of threads for your system, you must profile on the target hardware/software. Preferably on the system itself. This is going to take a fair amount of hand holding to get it to that point. The only way to automate it would be to have a watchdog that tests how things are running and increases threads or decreases based on failure rates. Kind of like how a number of enthusiast computers can now automatically overclock themselves: keep turning up the heat until the thing melts; then back off 1 bit.

Of course you may still run into issues if some percentage of threads go beyond what you would normally allow OR if some other piece of software starts up and takes away resources.


tldr; you have to hand tune this on the machine it's going to live on. Be prepared to retune often.

NotMe
  • 87,343
  • 27
  • 171
  • 245