55

I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. I want to know how to implement GC in C++.

I want some general idea about how to do it. Thank you very much!

This is a Bloomberg interview question my friend told me. He did badly at that time. We want to know your ideas about this.

andand
  • 17,134
  • 11
  • 53
  • 79
Josh Morrison
  • 7,488
  • 25
  • 67
  • 86
  • In the compiler/RTL? Or in userspace? The latter would necessarily involve a lot of programming discipline (i. e. no pointer-happy memory shuffling); it'd be cheaper to use some kind of smart pointer/ref counting scheme. It's more natural for C++. – Seva Alekseyev Feb 15 '11 at 21:45
  • I think the question is in user space... – Josh Morrison Feb 15 '11 at 21:46
  • These papers may interest you. [n2527](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2527.html) [n2585(pdf)](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2585.pdf) [n2586](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2586.html) [n2587](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2587.html) [n2670](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2670.htm) – Ise Wisteria Feb 15 '11 at 22:35

6 Answers6

61

Garbage collection in C and C++ are both difficult topics for a few reasons:

  1. Pointers can be typecast to integers and vice-versa. This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached.

  2. Pointers are not opaque. Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around.

  3. Memory management can be done explicitly. Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time.

  4. In C++, there is a separation between allocation/deallocation and object construction/destruction. A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. This is especially true for the standard library containers, which often make use of std::allocator to use this trick for efficiency reasons.

  5. Memory can be allocated from different areas. C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via mmap or other system calls, and, in the case of C++, from get_temporary_buffer or return_temporary_buffer. The programs might also get memory from some third-party library. A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up.

  6. Pointers can point into the middle of objects or arrays. In many garbage-collected languages like Java, object references always point to the start of the object. In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). This can greatly complicate the logic for detecting what's still reachable.

So, in short, it's extremely hard to build a garbage collector for C or C++. Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it.

As others have pointed out, the Boehm GC does do garbage collection for C and C++, but subject to the aforementioned restrictions.

Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. It may be possible in the future to build a really good C++11 garbage collector with this sort of information. In the meantime though, you'll need to be extremely careful not to break any of the above rules.

teodozjan
  • 913
  • 10
  • 35
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 3
    "This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it." - and no GC algorithm is going to cope with `intptr_t x = rand(); intptr_t y = ((intptr_t)p)^x; /* discard p and then some time later */; use_the_ptr((char*)(x^y));`. You need some co-operation from the user. – Steve Jessop Jun 17 '11 at 09:25
  • 5
    What new library functions in C++11 allow this? I'd love to know :) – djhaskin987 Jan 07 '16 at 18:20
4

Look into the Boehm Garbage Collector.

yan
  • 20,644
  • 3
  • 38
  • 48
4

C isn't C++, but both have the same "weakly typed" issues. It's not the implicit typecasts that cause an issue, though, but the tendency towards "punning" (subverting the type system), especially in data structure libraries.

There are garbage collectors out there for C and/or C++. The Boehm conservative collector is probably the best know. It's conservative in that, if it sees a bit pattern that looks like a pointer to some object, it doesn't collect that object. That value might be some other type of value completely, so the object could be collected, but "conservative" means playing safe.

Even a conservative collector can be fooled, though, if you use calculated pointers. There's a data structure, for example, where every list node has a field giving the difference between the next-node and previous-node addresses. The idea is to give double-linked list behaviour with a single link per node, at the expense of more complex iterators. Since there's no explicit pointer anywhere to most of the nodes, they may be wrongly collected.

Of course this is a very exceptional special case.

More important - you can either have reliable destructors or garbage collection, not both. When a garbage cycle is collected, the collector cannot decide which destructor to call first.

Since the RAII pattern is pervasive in C++, and that relies on destructors, there is IMO a conflict. There may be valid exceptions, but my view is that if you want garbage collection, you should use a language that's designed from the ground up for garbage collection (Java, C#, ...).

  • I don't see why there should be a conflict between RAII and GC. RAII is vastly better than GC for handling resources, but tracing garbage collectors are better than RAII in some scenarios involving immutable objects that don't hold resources (e.g. strings). Combining RAII with a tracing garbage collector will also deterministically trap many scenarios which would without a tracing GC cause untrappable Undefined Behavior (with normal C++ objects, trying to use an object after it's deleted is pure and simple UB; in a GC system, an object's methods can test if they're invoked on an object... – supercat Apr 01 '15 at 21:33
  • ...which has been asked to clean up its resources and is thus no longer usable, and can squawk if so. Unfortunately, languages seem to take an either/or approach rather than trying to combine them. – supercat Apr 01 '15 at 21:34
  • @supercat - personally, I would focus on the fact that reference cycles rarely if ever exist except as part of data structures. Even structures that, based on static types, could have reference cycles but never do in practice rarely occur except for data structures. Basically, there seems to be a need for potentially-cyclic references to objects that only (directly) manage memory and non-cyclic references to objects that may manage any resource. –  Apr 02 '15 at 13:37
  • A sufficiently powerful static type-system could manage that, maintaining compile-time proofs that cycles can only occur where valid so the reference-cycle issue never happens where any more complex cleanup is needed than freeing memory, but it doesn't happen in mainstream languages. I know (only just) enough about dependently typed languages to know that in principle one should be able to do it, but the programmer would have to write all the "proofs". –  Apr 02 '15 at 13:39
  • I didn't really believe this choice between RAII and GC was necessary even when I wrote this, but with the mainstream languages (and even most less-than-mainstream languages) currently available... –  Apr 02 '15 at 13:43
4

You could either use smart pointers or create your own container object which will track references and handle memory allocation etc. Smart pointers would probably be preferable. Often times you can avoid dynamic heap allocation altogether.

For example:

char* pCharArray = new char[128];
// do some stuff with characters
delete [] pCharArray;

The danger with the above being if anything throws between the new and the delete your delete will not be executed. Something like above could easily be replaced with safer "garbage collected" code:

std::vector<char> charArray;
// do some stuff with characters

Bloomberg has notoriously irrelevant interview questions from a practical coding standpoint. Like most interviewers they are primarily concerned with how you think and your communication skills than the actual solution though.

AJG85
  • 15,849
  • 13
  • 42
  • 50
  • You can do placement new to construct an object in previously allocated memory ... however the example was of unnecessary use of heap allocation the second bit was the part to "avoid dynamic heap allocation" as an alternative. – AJG85 Feb 15 '11 at 22:08
  • OK - makes sense. Pedantically, std::vector still goes to the heap for memory, but you don't have to worry about the cleanup. In (very) rare cases a possible efficiency issue, but as you say, still almost always the right thing to do. –  Feb 15 '11 at 22:20
  • Yes, still I'd almost always prefer to let the `std::allocator` handle memory management with resizing, reserving, growing, etc as I'm of a mind that the memory model design of STL containers is probably more efficient and stable than a homegrown implementation for the majority of cases. – AJG85 Feb 15 '11 at 22:29
  • @Steve: I don't see how there could be any efficiency issues if all it's doing is automating tasks. – GManNickG Feb 15 '11 at 22:29
  • @GMan - A simple C-style array on the stack is more efficient if you don't need resizing, mainly because heap management has some costs on each call to new and delete. But the difference is trivial in the vast majority of cases. –  Feb 15 '11 at 22:38
  • @Steve: "...if you don't need resizing..." Then you're comparing things that solve different problems, this is senseless. Compare a `std::vector` to other dynamic array approaches. – GManNickG Feb 15 '11 at 22:43
  • @GMan - the original example AJG85 used was a simple fixed-size C-style array - just on the heap instead of the stack. So no, not senseless in this case. Overhead for a feature you don't need might be trivial, but it's still overhead for a feature you don't need. –  Feb 15 '11 at 22:55
  • In a situation where I don't need resizing, I use a boost::shared_array. It makes sure the memory is cleaned up, and it allows me the functionality of an array. – hiddensunset4 Feb 15 '11 at 23:55
  • @Daniel - I find `boost::scoped_array` is safer than `boost::shared_array` in many cases by being non-copyable however it can't be used in STL containers for that same reason. This would be my boost alternative to using `std::vector` for performance sake. – AJG85 Feb 16 '11 at 15:22
  • 1
    @Gman & Steve - Try not to put my trivial example under too much scrutiny it was merely to illustrate a point. In some situations a little overhead for simplicity and stability is desirable. In some cases performance is the only concern. In both cases C++ has tools you can use to achieve what you want and so you are both right. – AJG85 Feb 16 '11 at 15:29
  • Theres nothing unsafe about using a boost::shared_array, they are very different in purpose. In most cases I want to use my array across several scopes; a shared_array does not 'copy' its contents, only the pointer. Remember they are 'smart pointers'. – hiddensunset4 Feb 17 '11 at 00:21
  • @Daniel I was referring to how it allows assignment and provides a copy constructor and can be used in STL containers where as `boost::scoped_array` does not. 'Safer' was the wrong word to use but this horse is perhaps significantly beaten off topic. I started a new thread to continue this: http://stackoverflow.com/questions/5026197/c-smart-pointers-comparisons-pros-cons-and-when-to-use – AJG85 Feb 17 '11 at 07:30
  • @Daniel - A C-style array on the stack still has destructor cleanup. I expect boost::shared_array and boost::scoped_array each have advantages, though, so thanks for the tip - it really is time I learned to use a few boost libraries properly. –  Feb 18 '11 at 19:27
  • Depending on the types your using, it will be better in most cases to use an std::vector vec(size); or vec.resize(size). It will yield the same performance as a C style array, it is copyable and it is exception safe. In essence, use what is best for your implementation. – hiddensunset4 Feb 21 '11 at 04:07
1

You can read about the shared_ptr struct.

It implements a simple reference-counting garbage collector.

If you want a real garbage collector, you can overload the new operator.

Create a struct similar to shared_ptr, call it Object.

This will wrap the new object created. Now with overloading its operators, you can control the GC.

All you need to do now, is just implement one of the many GC algorithms

Yochai Timmer
  • 48,127
  • 24
  • 147
  • 185
0

The claim you saw is false; the Boehm collector supports C and C++. I suggest reading the Boehm collector's documentation (particularly this page)for a good overview of how one might write a garbage collector in C or C++.

Gabriel Devillers
  • 3,155
  • 2
  • 30
  • 53
Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365