I'm working on a hobby compiler/interpreter for a toy procedural language and I've implemented most of the features I set out to explore except for a good garbage collection algorithm (similar to this guy). I've read quite a bit about various algorithms and I have a general idea of how to implement them. Earlier iterations of my language runtime used reference counting but I dropped it to learn something more advanced so I'm now considering a mark and copy compacting algorithm.
My first issue in getting started is preventing the algorithm from collecting 'objects' in native extension functions (i.e. functions written in C). The root set consists of 'objects' on the interpreter's stack and 'objects' in symbol tables, and I shouldn't have too much trouble with these, however, if a container 'object' is created in a C function, then populated with child 'objects', how can I prevent the GC from collecting them since it's not actually on the interpreter stack or bound to a symbol?
Things that make implementing GC easier:
- All 'objects' in my language are of a builtin type (e.g. not object oriented)
- The interpreter stack is just a stack of pointers to structs
- Symbol tables are just arrays of pointers to structs
User code:
f = open('words.txt', 'r');
lines = readlines(f);
close(f);
Interpreter (after parsing, compiling to bytecode...):
push
filename, open_mode- call
builtin_fopen
which returns a struct wrapping aFILE*
- store result in symbol
f
- push symbol
f
- call
builtin_flines
which creates a list typel
, then used Cfread
to read each line of the file as a string type, appending it to the listl
- store result in symbol
lines
, and so on....
Now if the GC ran while one of the strings containing a line in the file was being allocated, the root set does not yet have any reference to l
, so it should get collected.
Any ideas on how to handle this better?