4

I'm implementing mark-and-sweep garbage collection in a simple scripting language API I'm working on and have been reading about various implementations of garbage collection. API's such as Lua use mark-and-sweep with white, gray and black lists.

The problem is, I can't seem to find explanation of why there are such lists and why they are put into these specific colours.

In my current, trivial implementation, I simply use "dead" or "alive" states. In the sweep, the dead objects are deleted. I'm using the native heap so I'm not doing any moving in my GC.

I'm writing it in C.

// Performs a full garbage collection
void GorCollect(GorContext *ctx)
{
    Value *v, *end;
    Collectable *c, *n;

    // mark stack references
    end = ctx->stack + ctx->stackTop + 1;
    v = ctx->stack;
    while(v != end)
    {
        if (gvisgc(v) && v->v.gc) // mark if a collectable obj
            Mark(v->v.gc);
        v = v++;
    }

    // mark global references
    if (ctx->global)
        Mark((Collectable *)ctx->global); // ctx->global is a collectable obj

    // perform sweep
    c = ctx->gchead; // full list of collectable objs
    ctx->gchead = 0;
    while(c) {
        n = c->next;    
        // destroy unmarked collectable
        if (!c->marked)
            FreeCollectable(ctx, c);

        // rebuild gc list (singly-linked)
        else
        {
            c->marked = 0;
            if (!ctx->gchead)
                c->next = 0;
            else
                c->next = ctx->gchead;
            ctx->gchead = c;
        }
        c = n;
    }
}
Nick Bedford
  • 4,365
  • 30
  • 36
  • `why they are put into these specific colours` - because they're beautiful! – asaelr Feb 14 '12 at 23:27
  • Searching for "mark and sweep white gray black" led me to: http://www.memorymanagement.org/glossary/t.html#tri-color.marking . That page makes a point of mentioning that an important property of the algorithm is it's "correct", so my guess would be that the naive approach has some edge case where it breaks. – millimoose Feb 14 '12 at 23:30
  • Also: http://en.wikipedia.org/wiki/Mark_and_sweep#Naive_mark-and-sweep lists as the main disadvantage of the naive approach that it cannot be performed without suspending the process. – millimoose Feb 14 '12 at 23:32
  • I guess what I need to know is, given my simple black and white implementation, what would change to make it tricolor GC (besides the obvious GC flag for the mark state)? Instead of just pages talking about it in an abstract way. In my implementation, I have two "root sets", am I right? (the stack and the global object - object being a associative array style object). – Nick Bedford Feb 15 '12 at 00:46

2 Answers2

9

Gray means "live but not scanned": not all the descendants of a gray block have been marked as black yet. The gray color is necessary in an incremental garbage-collector. It helps a mark-and-sweep GC continue what it was doing the next time it gets a chance to do a bit of marking.

If your GC is non-incremental, then it may look to you like you don't necessarily need the gray color: you can simply always recurse on all the children of any live block you encounter. However, another more subtle issue is that this naive non-incremental recursive algorithm may overflow the stack. A gray color also helps by allowing to store information about what to visit next in the heap, instead of the stack.

Even if you use the gray color for this purpose, it doesn't prevent you from keeping a buffer of blocks that remain to be visited for efficiency. The only difference with the naive recursive implementation is that you stop updating the buffer when it's already full, and you scan the heap linearly for gray objects if the buffer becomes empty after having been full.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • I'm trying to wrap my head around how the algorithm can work without a full scan. What I mean is, if I have an aggregate object marked as grey, but one of it's child object's is marked as white, how does one avoid having to traverse the entire tree to ensure that white object is not referenced elsewhere. It doesn't really make sense how you can get away with not scanning the entire reference hierarchy. Unless the mark phase is only incremental and the sweep must done once the mark is fully scanned? In which case, you'd have to managed new references in-between marks. – Nick Bedford Feb 17 '12 at 01:07
  • "Unless the mark phase is only incremental and the sweep must done once the mark is fully scanned?" Of course, in an incremental mark-and-sweep GC the algorithm still alternates between a complete mark phase and a complete sweep phase. That's how mark-and-sweep works. I believe that the article "Uniprocessor Garbage Collection Techniques", by Paul Wilson, should answer your questions. – Pascal Cuoq Feb 17 '12 at 08:14
0

First of, they are sets, not lists and each object in the heap is at any time in exactly one of the sets.

Second, in any mark & sweep implementation they are being used, but they may be implied. You don't provide the implementation for Mark, but in that function you are moving your objects in your sets.

Here is the implementation for the mark phase of my garbage collector:

/* Initally, the white set contains all objects, the black and
   grey sets are empty. */
stack *st = m->mark_stack;
/* First all roots are added to the gray set. */
for (size_t i = 0; i < m->roots->used; i++) {
    ptr p = m->roots->array[i];
    if (p != 0) {
        /* Mark the root and move it to the gray set. */
        AT(p) |= 1;
        st_push(st, p);
    }
}
while (st->used) {
    /* Think of popping the object from the mark stack as moving
       it from the gray to the black set. */
    ptr p = st_pop(st);
    P_FOR_EACH_CHILD(p, {
        if (!GET_MARK(p_child)) {
            /* Then mark each non-black child and move it to the
               gray set. */
            AT(p_child) |= 1;
            st_push(st, p_child);
        }
    });
}
/* When control has reached this point, the gray set is empty and
   the whole heap has been divided into black (marked) and white
   (condemned) objects. */

We could instead use explicit data structures for the three sets. But for an stop-the-world mark & sweep gc, this implementation is much more efficient.

Björn Lindqvist
  • 19,221
  • 20
  • 87
  • 122