1

I'm designing a data structure that does a lot of object allocations in order to perform its function. It takes around 500 milliseconds to add one million items to it.

I've been looking for ways to optimize its functionality and just make it generally faster and I feel like I've exhausted other potential methods of increasing its performance.

While reading about garbage collection, I noticed there was an option for enabling server garbage collection and on a whim I just set it to true, and it caused the runtime to go from 500 ms to 200 ms: an unbelievable improvement!

I read that it uses multiple cores to run the garbage collector. I'm wondering a couple things. Most people have multicore systems nowadays so why isn't this included by default? Also, if I package my data structure as a DLL, would it be possible to have this enabled by default, perhaps only for my library and not for the rest of the application if the client prefers otherwise?

I have a feeling I'm not really going to be able to leverage this power, but perhaps there is a way to steal a couple techniques from this and incorporate it somehow in to my own program? How is server garbage collection so fast? Is it just the fact that it uses multiple cores or are there other features that I might be able to clone in my own code?

John Smith
  • 8,567
  • 13
  • 51
  • 74
  • _"I'm designing a data structure that does a lot of **object allocations** in order to perform its function. It takes around 500 milliseconds to add one million items to it."_ - if all you are doing `is allocating memory`, exactly `where is the GC relevant`? GC is generally for releasing memory unless you are generating alot of temporary objects during `foreach` loops. And even then though wont be a problem until the next GC. Impossible to say without seeing **how you are allocating** these _"one million items"_ –  Mar 29 '15 at 06:59
  • I've seen effects like this especially when your % time in GC is high (>30%). Whether server GC is appropriate depends a lot on a consuming application's uses. Client type apps generally want a "smooth" experience, this means smaller more consistent GCs. While server apps can generally hiccup without affecting user experience. Note that client and server GC mode is distinct from whether collection is concurrent. Server GC mode generally has higher limits for the acceptable amount of memory allocated before a GC is triggered. – Mike Zboray Mar 29 '15 at 07:00
  • Yeah I don't really know much about garbage collection honestly. I don't release many objects, but I know a lot of memory is being moved around, so I think that might impact it, being able to do that on another core. – John Smith Mar 29 '15 at 07:02
  • Server gc is resource intensive. It creates one thread per processor with Highest priority. Make sure you [read this fully](https://msdn.microsoft.com/en-us/library/ee787088%28v=vs.110%29.aspx) before messing up with server gc. Btw stating from clr 4.0 Background gc is enabled, which performs very good. Very little pause time. Your problem seems to be a XYProblem, solve the actual problem instead of hacking around with gc setting. – Sriram Sakthivel Mar 29 '15 at 07:03
  • Also concurrent collection is the default as of .NET 4.5 for both server and client GC. Prior to this concurrent was not available for server GC. My observation of our memory profile after enabling server GC was that we simply had fewer collections but they were bigger, causing big swings in memory usage. This might acceptable for a server type app but not a client app. If you are writing a library you won't have control over this. I would focus on reducing allocations. Make objects smaller. Switch to structs. Use flyweights, caching, or pooling. – Mike Zboray Mar 29 '15 at 07:06
  • @mikez: How am I supposed to use structs when allocating 500,000 KeyValuePairs of strings? It's better to have them as a class rather than box onto the heap. – John Smith Mar 29 '15 at 07:27
  • @SriramSakthivel: I mean, isn't 200 ms pretty good for inserting half of a million key value pairs? I've been looking for other solutions for weeks but I'm not sure what else I can do. Maybe I should hire a professional optimizer to review the code? – John Smith Mar 29 '15 at 07:27
  • You say *inserting half of a million key value pairs* which means you hold the records in memory, how does GC comes in to the picture? Where does the garbage generated? Your question is really unclear where you create garbage. Post code, that may make your question better. – Sriram Sakthivel Mar 29 '15 at 07:29
  • WHAT!!! C# Garbage collector is not activated by default? I thought Garbage collector is automatic when the Object is destroyed. That is why C++ is manually you need to handle garbage collection and C# is not. One of the Main Ups of C#. Never thought you would need to Enable it. What I only know is to be able to manipulate it, you will need to override Finilaze() method As this function is getting called when garbage collecting. So my question is, How do you Enable Garbage collection. – Aizen Mar 29 '15 at 07:30
  • @JohnSmith you said nothing about the implementation so I have no idea what exactly applies. Those are general techniques. Obviously you need to be strategic and measure. KeyValuePair is a struct so if you are boxing it multiple times you may want change to a class to avoid those allocations. – Mike Zboray Mar 29 '15 at 07:50
  • @Aizen _"WHAT!!! C# Garbage collector is not activated by default?"_ - it will run by default at a _certain time_ generally not known to your app. Except of course when you explicitly invoke it. –  Mar 29 '15 at 08:37

3 Answers3

3

Server GC is turned very differently from workstation GC. It makes very different assumptions about the kind of machine and the kind of application. A web server would be the canonical example. Runs on relatively beefy hardware with lots of RAM and a decent processor with plenty of cores. And is the only app that runs on the machine so can pretty much demand the use of all resources of the machine.

Workstation GC is tuned to work well on the kind of machine that's on your desktop. You run many programs on it and hate it when one of them makes your mouse sluggish to respond because the program is domineering the processor and disk. And you don't like it when a program stops being responsive for a while because it is busy collecting garbage.

The collection pauses used to be the most important selector, workstation GC always had decent support for doing the heavy collections in the background. So the program stays responsive. That's not so relevant anymore today, server GC acquired the ability to do background collections in .NET 4.5

Profiling is tricky, do make sure you are not being mislead by another aspect of server GC. It creates much bigger heap segments. So it is easy to profile your code and find a happy number back, one where the execution of the code was not interrupted by one or more gen #2 collections. That does not mean that you'll get this perf consistently, sooner or later you have to pay the price. Background collections only work well when you don't need a gen #2 collection but can survive well on just gen #0 and #1 collections while the background collection is trundling at it. In other words, a significant number of the collections you make need to be for short-lived objects. Not your kind of code, by the sound of it.

Responsible profiling requires a quality that not many programmers possess, you'd have to be ready to take bad news in stride and assume the glass is half-empty. Run the test many times, not just once. Avoid startup and initialization artifacts. And take the median of the measurements as the measurement result. With an eye on the worst-case one. And ignore the best ones, you just can rely on that ever reproducing in practice. Best to build it into the final software so you always get real numbers, PerformanceCounter is good for that.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • The tests have been run many times. For the past couple weeks I've probably ran the programs thousands of times trying different micro-optimizations. Right after I changed this flag I am consistently getting 150% speed increases, where I was previously getting 495-505 ms *every* single run, now I am getting 195-205 every run. I do a stopwatch right before the loop, then loop through a list adding each element to my data structure, and then stop the stopwatch right after the loop. – John Smith Mar 29 '15 at 08:40
0

new is a fairly expensive operation, no matter how much it is optimized, and every time you allocate a block of memory, sooner or later it will need to be collected. A common trick is to include in each block a forward pointer (excuse me, reference) and whenever you know you no longer need a block, rather than just letting go of it, you link it into a free list of blocks of that type. Then when you need one, first look to see if there are any on the free list. If so, use that instead of doing a new.

This way, you may be able to avoid most of the news and their corresponding GCs.

This is only worth doing if you know for certain that new (and thus GC) takes a large fraction of wall-clock time. The way I determine that is by random pausing.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Then what I would do is apply the diagnostic to see what the current problem is. It may be something else. – Mike Dunlavey Mar 29 '15 at 17:17
  • I'm pretty sure it's because it's a resizable tree with arrays and it's mostly just because the GC likes to move memory around. Not sure what I can do about it though. – John Smith Mar 29 '15 at 17:22
  • @John: I don't want to start haranguing, but when I hear "pretty sure", I think I'm hearing from someone who maybe feels inhibited about actually doing the physical task of profiling, and I told you the method I (and plenty of other people) trust. Once you try it a couple times, and it shows you stuff you could never guess, you will be sure what takes time, not pretty sure. – Mike Dunlavey Mar 29 '15 at 17:54
0

Most people have multicore systems nowadays so why isn't this included by default?

Workstation GC is enabled by default on Workstations (Win7/8/XP, Consumer OSes). Server GC is enabled by default on server OSes (Windows Server 2003, 2012, etc.). Workstations have lower limit on memory usage before GC kicks in. Depending on your code (which you didn't show) your limits on the server GC mode hasn't reached that limit and probably hasn't started the GC. See the section on Comparing workstation and server GC modes on MSDN.

Also, if I package my data structure as a DLL, would it be possible to have this enabled by default, perhaps only for my library and not for the rest of the application if the client prefers otherwise?

This can be controlled through the app/web.config file. The client could always override it. But you could detect it through the GCSettings.IsServerGC property and throw an exception or warning, but I would highly recommend NOT doing that. The .NET runtime can override those config settings if there is only 1 CPU on the system.

I've been looking for ways to optimize its functionality and just make it generally faster and I feel like I've exhausted other potential methods of increasing its performance.

Take a look at pre-allocating objects or re-using the same allocation of objects. For instance, many .NET collections have an internal array that can be pre-sized ahead of time to lower the amount of memory copies that have to take place as more objects are added to the collection.

Tim P.
  • 2,903
  • 24
  • 26