Clarified question (tl;dr)
After reading and profiling with all the results covered below, the problem seems to boil down to the GC not collecting Gen 0 Heap for our applications when in Server mode, as soon as its changed to Workstation mode the problem goes away.
Original question and details
My question is somewhat related to:This question and this question.
We recently ran into what seemed to be a memory leak in our .NET applications on our test environments, the worker processes would climb to around 450MB usage either quickly when under load or gradually when under no load.
The problem could not be replicated on our development environments, the primary difference being that the dev environments are physical servers whereas the test environments are virtualized and controlled by Puppet (apart from that I do not have much knowledge of the environments themselves).
In order to hopefully see what objects were responsible for taking up all that memory I ran Ants Memory Profiler on the test server, I found that all of that memory was remaining as unused and was never being released.
While researching what could cause this I came across this forum post which in turn lead me to this article.
I ended up trying the configuration it recommends, to put the GC into workstation mode:
<configuration>
<runtime>
<gcServer enabled="false"/>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
After running iisreset and re running my memory profiling the issue was completely gone, which is great but still doesn't really explain what was happening in the first place.
I did do some more reading and found this SO question, which leads me to believe that this configuration change may end up being detrimental to the throughput of our applications.
So my question is: What would cause an IIS worker process to accumulate a large amount of unused memory that never gets garbage collected?
Edit: To clarify my question a bit more, I believe we have proven that the code is not responsible for this, as the exact same code does not experience this issue on the dev environment.
Here are the screenshots I took of my memory profiling before and after the configuration change, there isn't a lot of information here but the graph does show the memory trend nicely.
Edit 2: Here are the server specs from what I could gather, I can possibly get more will just take time.
Dev Environment: Physical Machine CPU: Single core Memory: 6GB
Test Environment: Virtual Machine CPU: 4 Logical threads (I cannot comment on CPU count) Memory: 8GB
The only difference in the Machine.Config files is that the dev environment is adding "Microsoft.VisualStudio.Diagnostics.ServiceModelSink.Behavior" to both endpoint and service behaviours.
And the test environment currently has the GC settings previously mentioned in the aspnet.config file.
Edit 3: Did some more profiling and noticed a couple more counters that I could add within Ants, notably I added "Gen 0 heap size" and it looks like this is the source of the issues. With the GC in server mode when I trigger the test I'm using for profiling this line immediately jumps up to ~300MB then comes back down to ~230MB but never goes all the way back down (graph below).
Running the same profiling with the GC in workstation mode sees the Gen 0 heap size have a much smaller initial spike and return back to essentially zero when the requests have finished (graph below).
Doing some more searching on this lead me to another much more related SO question, however his findings were that this memory usage was a non issue, whereas in my case the service actually needs to be manually restarted at least once a day.
I also found this article that had the following to say on the issue (which seems to describe what is happening almost perfectly:
Generation 0 is likely to have a larger number of objects on a 64-bit system, especially when you use server garbage collection instead of workstation garbage collection. This is because the threshold to trigger a generation 0 garbage collection is higher in these environments, and generation 0 collections can get much bigger. Performance is improved when an application allocates more memory before a garbage collection is triggered.
Though the problem still remains that in Server mode the Generation 0 heap is seemingly never collected as opposed to just not as often.