10

Clarified question (tl;dr)

After reading and profiling with all the results covered below, the problem seems to boil down to the GC not collecting Gen 0 Heap for our applications when in Server mode, as soon as its changed to Workstation mode the problem goes away.

Original question and details

My question is somewhat related to:This question and this question.

We recently ran into what seemed to be a memory leak in our .NET applications on our test environments, the worker processes would climb to around 450MB usage either quickly when under load or gradually when under no load.

The problem could not be replicated on our development environments, the primary difference being that the dev environments are physical servers whereas the test environments are virtualized and controlled by Puppet (apart from that I do not have much knowledge of the environments themselves).

In order to hopefully see what objects were responsible for taking up all that memory I ran Ants Memory Profiler on the test server, I found that all of that memory was remaining as unused and was never being released.

While researching what could cause this I came across this forum post which in turn lead me to this article.

I ended up trying the configuration it recommends, to put the GC into workstation mode:

<configuration>
  <runtime>
    <gcServer enabled="false"/>
    <gcConcurrent enabled="false"/>
  </runtime>
</configuration>

After running iisreset and re running my memory profiling the issue was completely gone, which is great but still doesn't really explain what was happening in the first place.

I did do some more reading and found this SO question, which leads me to believe that this configuration change may end up being detrimental to the throughput of our applications.

So my question is: What would cause an IIS worker process to accumulate a large amount of unused memory that never gets garbage collected?

Edit: To clarify my question a bit more, I believe we have proven that the code is not responsible for this, as the exact same code does not experience this issue on the dev environment.

Here are the screenshots I took of my memory profiling before and after the configuration change, there isn't a lot of information here but the graph does show the memory trend nicely.

Before Configuration Change After Configuration Change

Edit 2: Here are the server specs from what I could gather, I can possibly get more will just take time.

Dev Environment: Physical Machine CPU: Single core Memory: 6GB

Test Environment: Virtual Machine CPU: 4 Logical threads (I cannot comment on CPU count) Memory: 8GB

The only difference in the Machine.Config files is that the dev environment is adding "Microsoft.VisualStudio.Diagnostics.ServiceModelSink.Behavior" to both endpoint and service behaviours.

And the test environment currently has the GC settings previously mentioned in the aspnet.config file.

Edit 3: Did some more profiling and noticed a couple more counters that I could add within Ants, notably I added "Gen 0 heap size" and it looks like this is the source of the issues. With the GC in server mode when I trigger the test I'm using for profiling this line immediately jumps up to ~300MB then comes back down to ~230MB but never goes all the way back down (graph below).

Gen 0 Before

Running the same profiling with the GC in workstation mode sees the Gen 0 heap size have a much smaller initial spike and return back to essentially zero when the requests have finished (graph below).

Gen 0 After

Doing some more searching on this lead me to another much more related SO question, however his findings were that this memory usage was a non issue, whereas in my case the service actually needs to be manually restarted at least once a day.

I also found this article that had the following to say on the issue (which seems to describe what is happening almost perfectly:

Generation 0 is likely to have a larger number of objects on a 64-bit system, especially when you use server garbage collection instead of workstation garbage collection. This is because the threshold to trigger a generation 0 garbage collection is higher in these environments, and generation 0 collections can get much bigger. Performance is improved when an application allocates more memory before a garbage collection is triggered.

Though the problem still remains that in Server mode the Generation 0 heap is seemingly never collected as opposed to just not as often.

Community
  • 1
  • 1
  • Could be any number of a million things. Without knowing more about your application, the kind of code running on it, and maybe even the code itself it would be difficult to say. – Ryan Mann Mar 23 '15 at 17:41
  • Just an idea, If you have More than 1 worker process enabled on the application pools and you don't have an Idle Timeout set, it is possible for heavy load to spin up extra worker processes and then they never shutdown when not under heavy load because they don't idle out. – Ryan Mann Mar 23 '15 at 17:43
  • Also, memory can be allocated that is not used by .Net if you have code that did not call dispose to release unmanaged resources. E.g. say you open a file stream, but didn't dispose the filestream and it falls out of scope. The GC will clean up all the .net components of it but the unmanaged part (the file handle in the windows api) will still be open, and Antz Profiler probably doesn't detect unmanaged resources that are in use. – Ryan Mann Mar 23 '15 at 17:46
  • @Ryios, I don't believe this can be a code issue, for reasons i touched on at the top of the question and that I have clarified in an edit. –  Mar 23 '15 at 18:20
  • @Ryios, if unmanaged resources were leaking I believe I would see that in my memory profiling, Ill see if i can redact my screen caps of the profiling enough to post. –  Mar 23 '15 at 18:23
  • @Phaeze The test environment has the same number of CPUs as the dev environment? Are there any GC settings in the global machine.config or web.config in those environments? – Sebastian Mar 23 '15 at 19:31
  • @lowleveldesign I've added what I can get immediately for the machine specs, as well as one difference in the machine.config files. But apart from the gc config i talk about there are no gc related settings in test and no gc related settings at all in dev. –  Mar 23 '15 at 19:51
  • You could be seeing a difference between dev and production depending on the configuration of the application pools running the IIS worker processes (W3wp.exe's) being different on the two boxes. The most obvious factor could be if production's Application pool is running on .Net 2 and your code is
    – Ryan Mann Mar 23 '15 at 21:06
  • @Ryios both environments have identical App pool configs and are Integrated 4.0. –  Mar 23 '15 at 21:51
  • Interesting, is one 32 bit and one 64 bit or both 64 bit? – Ryan Mann Mar 24 '15 at 00:37
  • @Ryios both are running 64 bit –  Mar 24 '15 at 15:10

3 Answers3

1

After much research, reading, and profiling I have been able to prove that our IIS memory usage is actually within the standard; this was done using the SysInternals Test Limit utility to push the server's physical memory usage to near max, once this was done all of our applications released their memory.

We do still have some kind of memory issue in our test environment that I need to investigate but at this point I think I can confidently say that this is entirely unrelated.

Moral of the story is to not assume that the cause reported with the issue is correct.

1

You can try to enable gcTrimCommitOnLowMemory setting in Aspnet.config file in the .NET Framework directory:

When the gcTrimCommitOnLowMemory setting is enabled, the garbage collector evaluates the system memory load and enters a trimming mode when the load reaches 90%. It maintains the trimming mode until the load drops under 85%.

https://msdn.microsoft.com/en-us/library/bb384209(v=vs.110).aspx

Another option (since .net v4.5) is to set performanceScenario to "HighDensityWebHosting" in the same Aspnet.config file. This is useful for shared hosting scenarios as it will "tune garbage collection to optimize for memory": http://www.asp.net/aspnet/overview/aspnet-and-visual-studio-2012/whats-new#_Toc_perf_5

As you can see from CoreCLR sources the HighDensityWebHosting option mostly disables gcServer and gcConcurrent settings, but enables gcTrimCommitOnLowMemory: https://github.com/dotnet/coreclr/blob/cbf46fb0b6a0b209ed1caf4a680910b383e68cba/src/vm/perfdefaults.cpp

Vlad Rudenko
  • 2,363
  • 1
  • 24
  • 24
  • 1
    The *HighDensityWebHosting* setting could reduce memory usage for few times on some servers. In some cases the CPU usage will be reduced as well. Keeping used memory as stable value allows to reduce number of nodes in web farm. – Vlad Rudenko May 21 '15 at 19:14
0

Not a direct answer, more like a bandaid, but if you can run .Net 4.5.1 Code in that w3wp process, you can compact the LOH and a lot of that unused allocated memory [might] reduce.

You could create App Start code that starts a timer that runs this every so often from inside the w3wp.exe process.

GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;

GC.Collect(); 

This feature was not added until 4.5.1 though, so you can't use it in a .Net Assembly that is not targetting at least 4.5.1 of the framework.

This might allow you to get rid of the web.config changes you made and keep the unallocated memory from staying high when it is not needed.

Ryan Mann
  • 5,178
  • 32
  • 42
  • I'm wary to try this as we have a configuration change that does remove the issue, and my question is asking what the cause is. Could definitely be something worth trying for others with similar issues though. –  Mar 24 '15 at 15:12
  • Yeah, I know it's not an answer, just can't post code in comments. If I had an answer I would post it, out of ideas.. Unless the operating systems of the two environments differ and or superfetch is running on production, bought all I've got left and no one else is replying. – Ryan Mann Mar 24 '15 at 16:28
  • Its all good, I appreciate your attempts at trying at least :). And it looks like we are opening an MS support ticket so I may have an answer soonish... hopefullly –  Mar 24 '15 at 16:33