Details of write barriers in the .Net Garbage Collector

Question

I have a large T[] in generation 2 on the Large Object Heap. T is a reference type. I make the following assignment:

T[0] = new T(..);

Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T? Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)
Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList<K, T> and I added a new, maximal item?

I've read through many questions and articles, including the ones below, but I still don't think I've found a clear answer. I'm aware that an entire range of memory is marked as dirty, not individual objects, but is the new array entry or the array itself the basis of this?

card table and write barriers in .net GC

Garbage Collector Basics and Performance Hints

That's a pretty good question. I didn't know that the write barrier worked with ranges. If that's the case, then it seems logical that part of the array gets invalidated (because it's contiguous memory), but there's really no reason for the whole array to be invalidated. Assuming a 128 bytes range like mentioned in the msdn article, and on a x64 architecture (8 bytes pointer size), that would mean that 16 items get invalidated. I don't think it should have a significant performance impact. But the article seems to use example values, I wonder what real ones are — Kevin Gosse, Dec 15 '17 at 09:04
Looking into the code of .net core GC (https://raw.githubusercontent.com/dotnet/coreclr/master/src/gc/gc.cpp), it seems to be much more complicated than implied in the article. Notably, there seem to be a "card bundle" mechanism that, quoting, "keeps track of groups of card words". I'm trying to figure out whether the card bundle is the "range" thingy mentioned in the article, or another mechanism on top of it — Kevin Gosse, Dec 15 '17 at 09:09
@KevinGosse, thanks. I'll assume that the GC manages inter-generational references in a smart way and that insertions near the end of a gen2 SortedList won't create a lot of work for the GC. — Eske Sparsø, Dec 18 '17 at 08:56

score 1 · Answer 1 · answered Feb 01 '19 at 12:27

Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T?

128B block containing the start of the array will be marked as dirty. The newly created instance (new T()) will be a new object, so it will first be checked through a Gen 0 collection without the card table.

For simplicity, presuming that the start of the array is aligned on a 128B boundary, this means the first 128B will be invalidated, so presuming T is a reference type and you're on a 64-bit system, that's first 16 items to check during the next collection.

Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)

Just these 16 to 32 items, depending on the pointer size in this architecture.

Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList and I added a new, maximal item?

Arrays are not special. A SortedList<K,T> maintains two arrays internally, so more blocks will end up dirty in the average case.

score 0 · Answer 2 · answered Feb 01 '19 at 12:09

pretty sure its tracking array slots, not the root which is holding reference to array object itself. btw if particual card is set dirty, it has to scan 4k of memory. ive read somewhere its now using windows' own mechanism which lets you get notifications if memory range in interest is written to.

Details of write barriers in the .Net Garbage Collector

2 Answers2