TPL and memory management

Question

Using the Visual Studio Concurrency Visualizer I now see why I don't get any benefit switching to Parallel.For: only the 9% of the time the machine is busy executing the code, the rest is 71% synchronization and 17% memory management (1).

Checking all the orange stripes on the diagram below I discovered that GC is always involved (2).

After reading all these interesting topics...

.. am I right assuming that all these threads need to play with a single memory management object and therefore removing the need to allocate objects on the heap my scenario will improve considerably? Like using structs instead of classes, array instead of dynamic lists, etc.?

I have a lot of work to do to bend my code in this direction. Just wanted to be sure before starting.

"*These segments in the timeline are associated with blocking times that are categorized as Memory Management. This scenario implies that a thread is blocked by an event that is associated with a memory management operation such as Paging. During this time, a thread has been blocked in an API or kernel state that the Concurrency Visualizer is counting as memory management. These include events such as paging and memory allocation. Examine the associated call stacks and profile reports to better understand the underlying reasons for blocks that are categorized as Memory Management.*" — TheGeneral, Oct 08 '20 at 06:21
Yes allocating less will likely have a large benefit on your resources and efficiency, but that is mostly ***always*** the case on *hot paths* and thrashed applications — TheGeneral, Oct 08 '20 at 06:22
@Fildor: It involves many classes and won't fit here, sorry. — abenci, Oct 08 '20 at 06:24
On saying that this would not be the only tool you would use to make your application less *allocatey*. A good *memory profiler* will go a long way, combined with learning how to read the results and affect changes based on the results. Creating *minimal allocation* code is an *artform*, and one worth your learning — TheGeneral, Oct 08 '20 at 06:27
`Just wanted to be sure before starting.` Short answer - you can't be 100% sure. Multithreading code requires benchmarking - there are some techniques that _tend_ to work well - but they aren't guaranteed. And they may work well on some machines and workloads, but not others. Benchmarking will be essential. — mjwills, Oct 08 '20 at 06:33

JonasH · Accepted Answer · 2020-10-09T08:05:13.607

From your screenshot it seems like memory allocation is blocked while waiting for GC to complete. There are server and workstation GC modes, and it may be concurrent or not, but all options need to block threads at least a little while. I would check in more detail how often, and how much time you are spending in GC, and how often gen 0/1 and 2 is running.

I believe that each thread has a separate ephemeral segment it uses for allocations, so that it would not need to synchronize allocations, unless it needs a new segment, or the allocation is on the large object heap. But I'm unable to find a reference for this.

In any case, you will likely benefit from reducing the amount and size of allocations. If possible, use a object pool or memory pool to reuse memory. You might also benefit from increasing the amount of memory and checking the application for memory leaks. A general recommendation for memory is that there should be two types of allocations:

Small temporary allocations that only live for a short duration, like a temporary object that live for the duration of a method call.
Long lived allocations of any size that live for the duration of the "application".

If this pattern is followed almost all garbage should be collected in Gen 0/1, and gen 2 collections should be fairly rare.

It also depends a bit if you are allocating many small objects, or large chunks of memory. If the former you may consider using structs since these are stack allocated. If the later you also need to consider memory fragmentation, and this should also improve by using a memory pool that only allocates fixed sized chunks of memory.

Edit:

At the very simplest a object pool could be something like this:

public class ObjectPool<T> 
{
    private ConcurrentBag<T> pool = new ConcurrentBag<T>();
    public T Get(Func<T> constructor) => pool.TryTake(out var result) ? result : constructor();
    public void Return(T obj) => pool.Add(obj);
}

This assumes that the objects represent identical resources, like byte arrays of some fixed size. But there are also existing implementations:

With _"use a object pool or memory pool to reuse memory"_ do you mean [this](http://stackoverflow.com/a/15092938/261010) or simply setting collection capacities to an initial value? — abenci, Oct 08 '20 at 19:32
@abenci setting initial capacities of collections are also a good idea. but I was refering more to something like the link. Added some details regarding object and memory pooling. — JonasH, Oct 09 '20 at 08:07

score 2 · Answer 2 · edited Jan 17 '21 at 10:22

Memory Management The Memory Management report shows the calls where memory management blocks occurred, along with the total blocking times of each call stack. Use this information to identify areas that have excessive paging or garbage collection issues.

Further more

Memory management time

These segments in the timeline are associated with blocking times that are categorized as Memory Management. This scenario implies that a thread is blocked by an event that is associated with a memory management operation such as Paging. During this time, a thread has been blocked in an API or kernel state that the Concurrency Visualizer is counting as memory management. These include events such as paging and memory allocation. Examine the associated call stacks and profile reports to better understand the underlying reasons for blocks that are categorized as Memory Management.

Yes, allocating less will likely have a large benefit on your resources and efficiency, but that is almost always the case on hot paths and thrashed applications

Heap allocations and particular Large Object Heap (LOB) allocations are costly, it also creates extra work for your The Garbage Collector and can fragment your memory causing even more inefficiency. The less you allocate, or reuse memory, or use the stack the better you are (in general).

This is also where you would learn to use a good memory profiler and get to know your garbage collector.

On saying that this would not be the only tool you would use to make your application less allocatey. A good memory profiler will go a long way, combined with learning how to read the results and affect changes based on the results.

Creating minimal allocation code is an artform, and one worth your learning

Also as @mjwills pointed out in the comments, you would run any change through your benchmark software as well, removing allocations at the cost of CPU time won't make sense. There are a lot of ways to speed up code, and low allocation is just one of a lot of approaches that may help.

Lastly, I would suggest following Marc Gravell and his blogs as a start (Mr DeAllocation), get to know your Garbage Collector and how the generations wortk, and tools like memory profilers and benchmarkers for performant silky smooth production code

Thanks! _The less you allocate, or reuse memory, or use the stack the better you are (in general)_: will the 71% of synchronization time benefit from this as well? — abenci, Oct 08 '20 at 06:58
@abenci most probably not. Reducing allocations is unlikely to have any effect to the synchronization overhead. So as the total amount of time goes down, the synchronization percentage will go up. To reduce the synchronization time you must improve the synchronization code, by minimizing the critical sections, or eliminating them altogether if possible. — Theodor Zoulias, Oct 08 '20 at 08:13
@Michael: I cannot find the references you provided, like _"Getting to know your Garbage Collector"_ can you add a link? Thanks. — abenci, Oct 08 '20 at 19:56
@abenci sorry i was predisposed. There is no such reference, but was just general remark, there are lots of good resource on the GC and how it works. — TheGeneral, Oct 08 '20 at 20:36

TPL and memory management

2 Answers2