I know how roots are found, but the thing is, (AFAIK) they have to be found at runtime. For that, you'll need a fixed size container that may overflow or a resizable container. I don't want to go with the fixed size container because it would not be easy to tell how much space to reserve (and it may be wasteful). The resizable container seems best, but the problem is, the GC runs when there is not enough space, so the resizable container won't be able to store what it needs to. So with these conditions, how are GC roots stored?
-
Why do you have to store them? It's not a facetious question. For example, every reference on a stack is a root, but they are all sitting there on the stack and they are not going anywhere (as long as you're doing a stop-the-world collection). – rici Jun 01 '20 at 00:08
-
@rici -- What stack are you talking of? The stack & heap stack? – xilpex Jun 01 '20 at 01:50
-
You do understand that GC roots are being found out at every GC invocation, right? Also GC structures are _outside_ the heap, while GC operates _on_ the heap. – Eugene Jun 01 '20 at 01:53
-
@Eugene -- Forgive my ignorance, but what do you mean by *GC roots are being found out at every GC invocation*? Do you mean *every allocation*? – xilpex Jun 01 '20 at 01:55
-
@Xilpex no. Suppose that "now" is the time to run the GC, your application has reached a point in time when GC needs to happen. Usually, the very first thing to do is to find out what are the GC roots. So you are correct, it has to be a resiazeable container, but since this container is outside the heap, it does not influence the heap memory. – Eugene Jun 01 '20 at 01:58
-
@Eugene -- Huh. I thought whenever there is an allocation you find a root. If you don't mind, could you explain how you find GC roots if that isn't how you find it? – xilpex Jun 01 '20 at 02:00
-
@Xilpex that depends on the language ( I might get the terms wrong for the ones I am not aware of ). In java, it's ( among other things ): static fields, stack references, jni references, etc. It's usually up to the language specification to define these – Eugene Jun 01 '20 at 02:06
-
1The most obvious one is : stack references. If something is on the stack, it is obviously in use. So that makes a perfect candidate for a root. To find these, thread stacks need to be scanned, of course. For this to be accurate, these are being scanned while your application is frozen. – Eugene Jun 01 '20 at 02:08
-
@xilpex: If you're looking for a reasonably easy-to-follow example, you could do worse than to look at the Lua implementation. – rici Jun 01 '20 at 03:02
1 Answers
A GC root is a location outside the heap that can contain a reference to an object inside the heap. A location can be anything that can store a reference. Usually it is four or eight bytes of memory storing 32 or 64 bit addresses but it can also be a machine register or space on disk. Sometimes a location is called a "slot" because you can "slot in" exactly one reference. A classical mark & sweep collector works by first marking all objects referenced by the roots and then continues the tracing from there.
Where and how roots are stored depends on the VM and gets very complicated when you consider things like partial GC, threading and JITting. But conceptually it's simple. Suppose you have a Python-like language with only functions and globals and the following code:
0: FOO = "hel"
1: BAR = "hi"
2: def foo(x):
3: y = x + "there"
4: <GC HERE>
5: return
6: def bar(x):
7: y = x + "lo"
8: foo(y)
9: return
10: bar(FOO)
11: ...
Assume GC happens on the indicated line. The callstack would look something like this:
Return address to line 11
Reference to object "hello"
Return address to line 9
Reference to object "hellothere"
The GC would scan through this callstack distinguishing between return addresses and references and marking the objects it finds. Then it would do the same for global references. They could be stored in a dictionary (hashmap) on the heap and referenced by a single root:
{name("FOO") : "hel", name("BAR") : "hi"}
Note that the space required for storing all roots is tiny. You only need eight bytes (one reference) for the globals and eight bytes for each element on the callstack. You can run out of stack space and get a stack overflow but with say 256kb preallocated for the stack and proper tail call optimization it is a non-issue.

- 19,221
- 20
- 87
- 122