Cases when it is usefull to collect and compact forecfully for the other processes

Question

Sorry for running into the "why to collect?" again. In a C# process, it is generally discouraged to collect the memory manually in order to free RAM. I overall agree.

To do it you would need to:

Run GC.Collect() for small objects (generations 0,1,2)
Compact the Large Object Heap (LOH)

Several reasons are invoked as to why this is useless (and possibly counter efficient). I overall agree but I can't help to wonder if this is dangerous.

If we don't do it, it happens that (at least with server GC and .NET 4.5), the system may dedicate several GBs of physical memory to your process that are not really used by living objects. Especially the LOH might never be compacted and use a lot of RAM that is actually free from the C# memory manager point of view. I experience this everyday with processes having 20 GB of physical memory dedicated by the system, used during a peak of RAM intensive computations hours ago, but only a small part is still in use.

As far as I can see, there is still no clear performance problem on the machines. It is usually argued that this is never a problem and I (almost) agree with it. The main point in the argumentation is that the "free" RAM in the LOH is not accessed by the process and thus moved to disk by the system (Windows) at least if the RAM runs low. This answer explains it very well: When is memory, allocated by .NET process, released back to Windows

From the point of view of the other processes... there is still some "worry".

The first point I see is that if another process has an emergency need for RAM, it needs time for the system do transfer the unused memory to disk. A preventive compaction could have prevented it.

But there is another point that is more worrying. Can the system really move the unused RAM to disk? What if the LOH is fragmented with 99% free space being interleaved with 1% used space. I guess the system works with large segments so that almost no segment would actually be "0% used". Maybe it's more complicated or maybe it's false. What do you know about it?

I'm interested in people having some experience with this. Have you observed cases when the theory "you don't need to collect" goes wrong and for some reason it has proven to be heathy to do it? (and you know why).

Xamarin + Android using Bitmaps, if you use them for a ListView per example and you don't GCcollect sometimes, the app will crash. — Gusman, May 18 '18 at 12:05

Damien_The_Unbeliever · Answer 1 · 2018-05-18T12:36:44.773

In modern OSs, you don't allocate RAM¹. You (Your process) allocate within your own address space. How the OS backs that up (when necessary) with physical RAM is largely transparent.

And what you've done in your address space is completely irrelevant to other processes.

See also: Raymond Chen's provocatively titled It's the address space stupid²

Can the system really move the unused RAM to disk?

It's not a matter of "used" or "unused" RAM. When the OS³ is running low on physical pages it'll identify pages to evict, using some suitable strategy (e.g. Least Recently Used is a simple one to get your head around). Then it has to decide what to do with the page.

If its lucky, the page its picked on is either backed by a file image (e.g. its part of the executable) or has not been changed since it was last written to disk. If that's the case, it can evict the page without performing any additional I/O. If its unlucky, it'll schedule the I/O and then continue its hunt for pages to free.

It's doing this all of the time⁴. And it's also doing other things like using otherwise unused pages to act as a filesystem cache.

The only time when you've got a real issue is when the sum total of all pages which are being actively worked in (the sum of each process's working set) is greater then the number of pages of RAM which are generally available to processes. At this point, you'll have thrashing since the system is continually writing pages to disk and retrieving them. But at this point, GC almost certainly won't help since we know that all of these pages are being actively worked with - they all contain reachable objects.

¹There are ways to ask for address space that is always backed by real RAM and never swapped, but that's not what's typically done and you have to deliberately go out of your way to make this happen.

²I'm aware that its a riff on a political quote but that doesn't mean all readers here will necessarily recognise that and some people in the past have felt that my recommending this article is really me literally calling them stupid.

³I'm trying to be general here but recognise that I may be straying from what "any" OS will do into what Windows specifically does here.

⁴ PDC10: Mysteries of Windows Memory Management Revealed: Part Two

I used the word "allocate" incorrectly. I updated my question. — Benoit Sanchez, May 18 '18 at 12:08
"But at this point, GC almost certainly won't help since we know that all of these pages are being actively worked with - they all contain reachable objects." My point is the GC will help because it will compact. Before compaction you may have pages totalling 1GB containing 1M reachable small objects. After compaction your pages will total to 10 MB. Same for the LOH. Or there is some point I'm missing. — Benoit Sanchez, May 18 '18 at 13:52

Benoit Sanchez · Accepted Answer · 2018-12-03T10:53:59.793

I've been working for a while on this question. The conclusion is:

LOH fragmentation matters only a little in terms of RAM consumption
this "a little" is however important in some scenarios

This article says it can be useful to compact the large object heap : Large Object Heap Compaction: Should You Use it?. It says a lot of true things and is practically useful but it is a bit shallow. The usefulness of compacting the LOH must be studied together with the paging of virtual memory by the system.

Summing up the article:

By default the large object heap is never compacted. Compacting the LOH can improve the performance of your own process making large object allocation faster. Most importantly it allows other processes to use more RAM. But it has a cost: compacting the LOH is not negligible. That's why the default behaviour is to not compact it. You have to measure the pros and cons and find the appropriate strategy to trigger the compaction: like every XX minutes.

Now I will explain why this is almost false. You need to read this article explaining how virtual memory is managed: Physical and Virtual Memory in Windows 10.

If some blocks inside the LOH are free, then they are never accessed. Thus the system is capable of moving these blocks out of the RAM. Each active (=used) block in the LOH used by a true object has size > 85k. The page size is 4k (both in 32 bits and 64 bits Windows): each of these pages can be moved out of the RAM. A simple reasoning can show you that the proportion of free space in the LOH that can't be moved out of RAM is always small. Either you have large free blocks and then most of their pages can be moved out of RAM, or you have small free blocks and then their size is small compared to used blocks. In both cases, you don't have much free space in the LOH that can't be moved out of RAM.

I tested it. I wrote a process creating a fragmented LOH of size 10 GB where only 1% is active (100 MB) interleaving active blocks of 1 mb with free blocks of 100 mb. Only this 1% was memory actively used for computation. On my 16 GB laptop, I could run three of these process without a problem. They were not slow. When the first one started the system gave a working set of 10 GB to it. When the second process started, this working set was reduced without impacting the speed of either process a lot. Same for the third. After a few minutes, the working set of each process was 100 MB as expected. Only their virtual memory was 10 GB.

The cruise speed of each process was quite optimal. But the system needed to write theses GB to disk. This really slows down starting the process. This took time and resources. The page file grows. Compacting the LOH of each process would have been a better strategy. So yes, compacting the LOH is better, but the default strategy of the system is rather good. There might some more subtle issues: what happens exactly when you re-use a free block that was moved to disk: does the system need to read it from file?

There is also a practical concern unrelated to performance. When you work in a company running hundreds of processes on hundreds of machines, you want to monitor and detect a potential RAM overflow. With fragmented heaps, measuring the size of the working set is misleading. And .NET does not give an easy access to "reachable objects total size". So when you see a process "using" 10 GB (working set size), you don't know if these 10 GB are really needed or not. Compacting the LOH helps making the indicator "working set size" practically usable.

Note: this is the code used for my experimentation:

public static class ExampleFragmentation
{
    public static void Main()
    {
        var mega = 1024 * 1024;
        var smallArrays = new int[100][];
        var largeArrays = new int[100][];
        for (int i = 0; i < 100; i++)
        {
            smallArrays[i] = CreateArray(mega);
            largeArrays[i] = CreateArray(100 * mega);
        }
        largeArrays = null;

        while (true)
        {
            DateTime start = DateTime.Now;
            long sum = 0;
            for (int i = 0; i < 100; i++)
                foreach (var array in smallArrays)
                    foreach (var element in array)
                        sum += element;
            double time = (DateTime.Now - start).TotalSeconds;
            Console.WriteLine("sum=" + sum + "  " + time + " s");
        }
    }

    private static int[] CreateArray(int bytes)
    {
        var array = new int[bytes / sizeof(int)];
        for (int i = 0; i < array.Length; i++)
            array[i] = 1;
        return array;
    }
}

Cases when it is usefull to collect and compact forecfully for the other processes

2 Answers2