0

I am puzzled why setting GCSettings.LatencyMode to GCLatencyMode.LowLatency negatively impacts the time of execution?

Please consider the following code. Note that I have sufficient threads in the thread pool so I ensure there is no latency introduced here. Also, I have plenty of memory available on this machine. The difference between running in Interactive and LowLatency causes a 3 fold increase in execution time for LowLatency.

class Program
{
    static void Main(string[] args)
    {
        //capture current latency mode
        var currentLatencyMode = GCSettings.LatencyMode;

        //set low latency mode to minimize garbage collection
        GCSettings.LatencyMode = GCLatencyMode.LowLatency;

        var watch = new Stopwatch();
        var numberTasksToSpinOff = 4;
        var numberItems = 20000;
        var random = new Random((int)DateTime.Now.Ticks);
        var dataPoints = Enumerable.Range(1, numberItems).Select(x => random.NextDouble()).ToList();
        var workers = new List<Worker>();

        //structure workers
        for (int i = 1; i <= numberTasksToSpinOff; i++)
        {
            workers.Add(new Worker(i, dataPoints));
        }

        //start timer
        watch.Restart();

        //parallel work
        if (workers.Any())
        {
            var processorCount = Environment.ProcessorCount;
            var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = processorCount };
            Parallel.ForEach(workers, parallelOptions, DoSomeWork);
        }

        //stop timer
        watch.Stop();

        //reset latency mode
        GCSettings.LatencyMode = currentLatencyMode;

        Console.WriteLine($"Time it took to complete in Milliseconds: {watch.ElapsedMilliseconds}");
        Console.WriteLine("Press key to quit");
        Console.ReadLine();
    }

    private static void DoSomeWork(Worker worker)
    {
        Console.WriteLine($"WorkerId: {worker.WorkerId} -> New Tasks spun off with in Thread Id: {Thread.CurrentThread.ManagedThreadId}");

        var indexPos = 0;
        foreach (var dp in worker.DataPoints)
        {
            var subset = worker.DataPoints.Skip(indexPos).Take(worker.DataPoints.Count - indexPos).ToList();
            indexPos++;
        }
    }
}

public class Worker
{
    public int WorkerId { get; set; }
    public List<double> DataPoints { get; set; }

    public Worker(int workerId, List<double> dataPoints)
    {
        WorkerId = workerId;
        DataPoints = dataPoints;
    }
}
trincot
  • 317,000
  • 35
  • 244
  • 286
Matt
  • 7,004
  • 11
  • 71
  • 117
  • It takes around 11 gigabtes. I have 96 gigabytes of memory. It is a 64 bit application. – Matt Feb 06 '18 at 09:34
  • I am profiling why 4 or 5 worker threads running in parallel take significantly longer than running 1 or 2 workers. I look to parallelize worker threads as you can see in my sample code. – Matt Feb 06 '18 at 09:49
  • For `Interactive` which originally brought me to profile whether it is the garbage collector or memory allocations that cause the overhead. – Matt Feb 06 '18 at 09:52
  • This question of mine is what brought me here: https://stackoverflow.com/questions/48637097/how-to-properly-parallelize-worker-tasks/48637607?noredirect=1#comment84273522_48637607. I am looking to isolate whether GC is responsible for the overhead, memory allocation performance or something else. – Matt Feb 06 '18 at 09:55
  • What did you learn when you visualised the GCs? http://mattwarren.org/2016/06/20/Visualising-the-dotNET-Garbage-Collector/ – mjwills Feb 06 '18 at 10:02
  • 1
    I would suspect memory allocations. Your `DataPoints` list is about 160 KB. So half of your iterations in `DoSomeWork` are going to create their list on the LOH. And since Gen 2 collections are disabled, every one of those allocations will require getting more memory from the operating system. That's going to be much more expensive than the occasional Gen 2 garbage collection. – Jim Mischel Feb 06 '18 at 23:11
  • T@JimMischel, this makes the most sense among all other explanations, given. Do you have any recommendations about how I can confirm this train of thought? – Matt Feb 07 '18 at 09:06
  • Use [Performance Monitor](https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/cc749115%28v%3dws.10%29) to view the Windows memory performance counters, and the [GC related performance counters](https://learn.microsoft.com/en-us/dotnet/framework/debug-trace-profile/performance-counters#memory). – Jim Mischel Feb 07 '18 at 23:21

1 Answers1

4

There is no free lunch here, the garbage collector has to do a job and tries to take your concerns into consideration. However there is no one size fits all (especially when trying to push its limits).

Latency Modes

To reclaim objects, the garbage collector must stop all the executing threads in an application. In some situations, such as when an application retrieves data or displays content, a full garbage collection can occur at a critical time and impede performance. You can adjust the intrusiveness of the garbage collector by setting the GCSettings.LatencyMode property to one of the System.Runtime.GCLatencyMode values

Further more

LowLatency suppresses generation 2 collections and performs only generation 0 and 1 collections. It can be used only for short periods of time. Over longer periods, if the system is under memory pressure, the garbage collector will trigger a collection, which can briefly pause the application and disrupt a time-critical operation. This setting is available only for workstation garbage collection.

During low latency periods, generation 2 collections are suppressed unless the following occurs:

  • The system receives a low memory notification from the operating system.
  • Your application code induces a collection by calling the GC.Collect method and specifying 2 for the generation parameter.

Guidelines for Using Low Latency

When you use LowLatency mode, consider the following guidelines:

  1. Keep the period of time in low latency as short as possible.

  2. Avoid allocating high amounts of memory during low latency periods. Low memory notifications can occur because garbage collection reclaims fewer objects.

  3. While in the low latency mode, minimize the number of allocations you make, in particular allocations onto the Large Object Heap and pinned objects.

  4. Be aware of threads that could be allocating. Because the LatencyMode property setting is process-wide, you could generate an OutOfMemoryException on any thread that may be allocating.

  5. ...

As per the guidelines (and taking into consideration your previous question, How to properly parallelize worker tasks?) you are obviously trying to use it against its intended ideal operating conditions.

I think the most important points for you are 1 and 3, obviously the garbage collector is either being forced to cleanup by a gc.collect command or it feels it needs to cleanup the massive amount of memory you are using allocating, i.e 11 gigs.

The key here, is without knowing the exact internals and working of the garbage collector and knowing exactly what you are doing and the reason why, there may not ever be an ideal answer to your question to say anything other than "in your your situation it does impact execution time"

TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • 1
    I read this MS page on `LatencyMode` already. Setting the mode to `LowLatency` I confirmed that no gen 2 GC takes place, memory during the process run is increasingly taken up which does not occure when running in `Interactive` mode. I am not looking for a free lunch, I am looking for an explanation of what is happening. If garbage is not collected then the overhead logically does not come from GC. My question hence is: What causes the overhead when running several worker threads vs running less. – Matt Feb 06 '18 at 10:05
  • 1
    I am not looking for "recommended practices", I am fully aware that creating lots of data collections with `ToList()` in my sample code is not recommended but. But there are use cases where such occurs. My question is about **what** causes the overhead exactly. You are making a circular argument. With garbage collection that pauses other threads for a moment it all makes sense. When disabling GC, however, your argument is not valid. The cause of the problem turns out to be something else by logical reasoning. – Matt Feb 06 '18 at 10:07
  • @MattWolf, My assumption was that you would have read it, though the answer is still valid really. The information was for anyone else reading. However the "reveal" is still valid . "in your your situation it does impact execution time" and if you change the parameters of your test app slightly, it might not. this question really is a moving target and without someone with full first hand knowledge of the internals and debugging your test app. i doubt you will get a satisfying answer... Anyway put a nasty bounty on it and it might (maybe).. i wish you the best of luck – TheGeneral Feb 06 '18 at 10:10
  • 1
    Sorry but I do not follow your line of reasoning at all. I stated clearly I succeeded in suppressing GC via LowLatency mode. But execution severely slows down. I do not see how you even address this question in the slightest. – Matt Feb 06 '18 at 10:12
  • @mjwills agreed, anyway all good, Matt Wolf happy thread testing and GC adventures – TheGeneral Feb 06 '18 at 10:16