49

I keep hearing people complaining that C++ doesn't have garbage collection. I also hear that the C++ Standards Committee is looking at adding it to the language. I'm afraid I just don't see the point to it... using RAII with smart pointers eliminates the need for it, right?

My only experience with garbage collection was on a couple of cheap eighties home computers, where it meant that the system would freeze up for a few seconds every so often. I'm sure it has improved since then, but as you can guess, that didn't leave me with a high opinion of it.

What advantages could garbage collection offer an experienced C++ developer?

coppro
  • 14,338
  • 5
  • 58
  • 73
Head Geek
  • 38,128
  • 22
  • 77
  • 87
  • Can you describe what "RAII with smart pointers" is? – Craig Day Oct 23 '08 at 05:27
  • 14
    It's a powerful C++ idiom, and a well-known term in the C++ world. If you don't know, I would suggest asking a question (or searching, maybe there is one already). – coppro Oct 23 '08 at 05:30
  • He means that if you're strictly object-oriented, you can rely on delete[] to be called on your object when it falls out of scope or there are no more references to it, which should free() any memory and resources the object was holding onto. – Matt J Oct 23 '08 at 05:32

16 Answers16

74

I keep hearing people complaining that C++ doesn't have garbage collection.

I am so sorry for them. Seriously.

C++ has RAII, and I always complain to find no RAII (or a castrated RAII) in Garbage Collected languages.

What advantages could garbage collection offer an experienced C++ developer?

Another tool.

Matt J wrote it quite right in his post (Garbage Collection in C++ -- why?): We don't need C++ features as most of them could be coded in C, and we don't need C features as most of them could coded in Assembly, etc.. C++ must evolve.

As a developer: I don't care about GC. I tried both RAII and GC, and I find RAII vastly superior. As said by Greg Rogers in his post (Garbage Collection in C++ -- why?), memory leaks are not so terrible (at least in C++, where they are rare if C++ is really used) as to justify GC instead of RAII. GC has non deterministic deallocation/finalization and is just a way to write a code that just don't care with specific memory choices.

This last sentence is important: It is important to write code that "juste don't care". In the same way in C++ RAII we don't care about ressource freeing because RAII do it for us, or for object initialization because constructor do it for us, it is sometimes important to just code without caring about who is owner of what memory, and what kind pointer (shared, weak, etc.) we need for this or this piece of code. There seems to be a need for GC in C++. (even if I personaly fail to see it)

An example of good GC use in C++

Sometimes, in an app, you have "floating data". Imagine a tree-like structure of data, but no one is really "owner" of the data (and no one really cares about when exactly it will be destroyed). Multiple objects can use it, and then, discard it. You want it to be freed when no one is using it anymore.

The C++ approach is using a smart pointer. The boost::shared_ptr comes to mind. So each piece of data is owned by its own shared pointer. Cool. The problem is that when each piece of data can refer to another piece of data. You cannot use shared pointers because they are using a reference counter, which won't support circular references (A points to B, and B points to A). So you must know think a lot about where to use weak pointers (boost::weak_ptr), and when to use shared pointers.

With a GC, you just use the tree structured data.

The downside being that you must not care when the "floating data" will really be destroyed. Only that it will be destroyed.

Conclusion

So in the end, if done properly, and compatible with the current idioms of C++, GC would be a Yet Another Good Tool for C++.

C++ is a multiparadigm language: Adding a GC will perhaps make some C++ fanboys cry because of treason, but in the end, it could be a good idea, and I guess the C++ Standards Comitee won't let this kind of major feature break the language, so we can trust them to make the necessary work to enable a correct C++ GC that won't interfere with C++: As always in C++, if you don't need a feature, don't use it and it will cost you nothing.

Community
  • 1
  • 1
paercebal
  • 81,378
  • 38
  • 130
  • 159
  • The one think I hope we don't get is (Java Like) Phoenix objects. Were the finalizer can make the object live again. But the second time it is garbage collected the finalizer is not run. – Martin York Oct 23 '08 at 07:35
  • As I understood, the C++09 would _facilitate_ garbage collection. – xtofl Oct 23 '08 at 12:31
  • http://www.artima.com/cppsource/cpp0x.html, an article by B. Stroustrup: "C++0x will most likely support optional garbage collection" – paercebal Oct 27 '08 at 22:44
  • Now, from Wikipedia: http://en.wikipedia.org/wiki/C%2B%2B0x#Transparent_garbage_collection : "Full garbage collection support has been remanded to a later version of the standard or a Technical Report." So I guess you're right. :-) – paercebal Oct 27 '08 at 22:53
  • 1
    C++'s RAII is limited an could be better. – Tim Matthews Feb 08 '09 at 22:55
  • xtofl: As I understood it, the newer standards of C++ would require C++ to facilitate GC /for_memory_alone/ - effectively nothing changes in C++, except that the memory for the object is not actually released. – Arafangion Mar 10 '09 at 04:26
  • 5
    @Ctrl Alt D-1337 : Could you give us some examples of "C++'s RAII is limited an could be better" ? Is there a language with a better RAII ? – paercebal Jun 10 '09 at 16:39
  • @paercebal - try/finally is useful sometimes (but can now be simulated with lambdas), but aside from that C++'s RAII would be hard to improve on. – Daniel Earwicker Aug 04 '09 at 07:00
  • 7
    More to the point, you need both RAII and GC. One doesn't preclude the other. Most languages with GC baked into them from the start also have RAII-like idioms as well. And you may think you don't *need* GC, but who would honestly reject greater convenience and higher productivity, if available? Pervasive GC makes you design and code in a different way, and your productivity rises. Another often ignored advantage is that it often performs better as well! – Daniel Earwicker Aug 04 '09 at 07:12
  • 11
    @Earwicker: The major languages with GC I know (i.e. non-script non-niche languages) are Java and C#. Java has no RAII whatsoever, and C#'s RAII is far from satisfying when coming from C++. Still, we have a common viewpoint: If we can afford it, working in a language where memory allocation is handled in the background saves a lot of time. – paercebal Aug 07 '09 at 11:03
  • @Earwicker: You're right about the try/finally: Sometimes, we want code to execute no matter how we exit the scope, and writting a local struct just to have its destructor do the cleaning is painful. Another solution is to use Boost.ScopedExit, at http://www.boost.org/doc/libs/1_39_0/libs/scope_exit/doc/html/index.html ... – paercebal Aug 07 '09 at 11:06
  • @paercebal: FWIW, trees by definition cannot have cycles. Perhaps you meant "graph" over "tree". Otherwise good answer – Thomas Eding Dec 29 '12 at 07:15
  • 3
    @Thomas Eding : No, I do mean Tree. A (XML) DOM is a Tree like sructure, but each node has (usually), in addition to a list of pointers to its children nodes, a pointer to its parent node (and perhaps even a pointer to its document node). Meaning two nodes always have a cyclic relation. One could handle it either with owner classes for each root (the document?), or with a mix of shared_ptr/weak_ptr, or with a GC which means that you could hold part or all the DOM without really caring which part is to be destroyed or not, only by setting some pointers to null... – paercebal Dec 29 '12 at 08:34
  • 2
    @Thomas Eding : ... The problem which comes with the GC is then, of course, you are holding one tiny DOM node without realizing the whole tree comes attached to it. This is a kind of leak. And then, if for some reason some hidden part of your code holds a pointer to that tiny node (some event listener, for example, like a delegate in C#, or an anonymous inner class in Java), then you really have a leak, no matter the GC... All in all, all kind of memory handling have their issues... The good thing about having a GC in C++ would be to have **the choice** of the memory handling... :-) – paercebal Dec 29 '12 at 08:37
  • 1
    D is a language where you do have RAII and GC – Quonux Oct 05 '13 at 13:17
  • Python has GC and RAII. Python implements RAII with context managers, which is a poecial protocol to ensure deterministic execution of setup and teardon code. – Sturla Molden Jan 30 '15 at 21:31
  • A simpler example of "floating" data is a string. In .NET or Java, passing around a string is no more expensive than passing around an integer or float, and requires neither data copying nor memory synchronization. Is there any way to design a string class in C++ that's anywhere near as efficient in multi-threaded scenarios? – supercat Feb 03 '15 at 18:03
  • @supercat : using a "const std::string &" ? If you don't modify the std::string, passing a const reference is ok, needs no copy nor synchronization... And if you want to change its value, then you are generating another string in all languages (with differing costs). And if you want to modify it (append data, change a character), then the C++ version would be more efficient. In the end, I'm not sure the immutable data is a good reason to justify a GC. Did I miss something? – paercebal Feb 04 '15 at 10:12
  • @paercebal: Consider the class `class moof { String cache; public String q(int a, String st) { if (trickyFunction(a) cache = b; else b=cache; return b;}`. In a GC language, the method never needs to copy (or do anything with) the contents of any string, and may be used on strings that are also used in other threads without need for memory synchronization. Could one write a C++ version with both those traits? – supercat Feb 04 '15 at 16:29
  • @paercebal: Immutable reference types are slower to "work with" than mutable types, since every operation requires creating a new object. On the other hand, immutable types in a GC system may be passed around as cheaply as primitives. In many cases where an object, once built, will be passed around a lot without modification, an efficient pattern is to use a mutable object to build up the state of an object and then build an immutable object encapsulating that state. This pattern ends up being a "win" if the object gets passed around three or more times, and can be a huge win... – supercat Feb 04 '15 at 17:46
  • ...if a large object gets passed around frequently. I think the pattern is rare in C++ because C++ can't handle it effectively, but in GC languages I think it can sometimes work out much more efficiently than anything that could be accomplished without GC. – supercat Feb 04 '15 at 17:48
  • @paercebal: I just added an answer with a more realistic example usage. – supercat Feb 04 '15 at 19:57
12

The short answer is that garbage collection is very similar in principle to RAII with smart pointers. If every piece of memory you ever allocate lies within an object, and that object is only referred to by smart pointers, you have something close to garbage collection (potentially better). The advantage comes from not having to be so judicious about scoping and smart-pointering every object, and letting the runtime do the work for you.

This question seems analogous to "what does C++ have to offer the experienced assembly developer? instructions and subroutines eliminate the need for it, right?"

Matt J
  • 43,589
  • 7
  • 49
  • 57
  • Glad it was taken in the spirit in which it was intended. I tend towards more manual methods myself :-) – Matt J Oct 23 '08 at 05:43
  • 3
    If you are using reference-counted smart pointers, beware of reference loops. One of the advantages of garbage collection is that it isn't confused by reference loops. – CesarB Oct 23 '08 at 12:58
  • 3
    If you make proper use of boost::weak_ptr, reference loops aren't a problem. – Head Geek Oct 26 '08 at 04:44
  • 3
    except that smart pointers free resources the moment they go out of scope, a GC can have them hang around for ages. It can make a big difference. – gbjbaanb Dec 28 '08 at 21:28
  • 1
    @Head Geek - if you make proper use of assembler language, etc. (see last paragraph of Matt J's answer). – Daniel Earwicker Aug 04 '09 at 07:01
  • @gbjbaanb: "the moment they go out of scope". Conversely, GCs can and do collect values before they go out of scope where as reference counted smart pointers can keep values alive until the end of their scope. – J D Jun 17 '13 at 12:21
  • "If every piece of memory you ever allocate lies within an object, and that object is only referred to by smart pointers, you have something close to garbage collection (potentially better)". Actually what you have literally is a form of garbage collection, albeit circa 1960. Naive scope-based reference counting is ~10x slower than a tracing GC but, in C++, you can elide some of the computational effort by hand to close the performance gap. http://flyingfrogblog.blogspot.co.uk/2011/01/boosts-sharedptr-up-to-10-slower-than.html – J D Jun 24 '13 at 20:20
9

With the advent of good memory checkers like valgrind, I don't see much use to garbage collection as a safety net "in case" we forgot to deallocate something - especially since it doesn't help much in managing the more generic case of resources other than memory (although these are much less common). Besides, explicitly allocating and deallocating memory (even with smart pointers) is fairly rare in the code I've seen, since containers are a much simpler and better way usually.

But garbage collection can offer performance benefits potentially, especially if alot of short lived objects are being heap allocated. GC also potentially offers better locality of reference for newly created objects (comparable to objects on the stack).

Greg Rogers
  • 35,641
  • 17
  • 67
  • 94
  • Greg, could you expand a little on your last paragraph? I was thought this was the job of any memory allocator - even malloc - not just garbage collectors (which essentially figure out when to call free() for you). But I am no pro on this, would love a more detailed explanation. – SquareCog Oct 23 '08 at 06:07
  • One big potential performance advantage of gc is that you can allocate/free in one pass instead of many. It really depends on the situation: in some environments, manual memory allocation or RAII with custom allocators may be easier to handle than gc. – David Cournapeau Oct 24 '08 at 10:38
9

I don't understand how one can argue that RAII replaces GC, or is vastly superior. There are many cases handled by a gc that RAII simply cannot deal with at all. They are different beasts.

First, RAII is not bullet proof: it works against some common failures which are pervasive in C++, but there are many cases where RAII does not help at all; it is fragile to asynchronous events (like signals under UNIX). Fundamentally, RAII relies on scoping: when a variable is out of scope, it is automatically freed (assuming the destructor is correctly implemented of course).

Here is a simple example where neither auto_ptr or RAII can help you:

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <memory>

using namespace std;

volatile sig_atomic_t got_sigint = 0;

class A {
        public:
                A() { printf("ctor\n"); };
                ~A() { printf("dtor\n"); };
};

void catch_sigint (int sig)
{
        got_sigint = 1;
}

/* Emulate expensive computation */
void do_something()
{
        sleep(3);
}

void handle_sigint()
{
        printf("Caught SIGINT\n");
        exit(EXIT_FAILURE);
}

int main (void)
{
        A a;
        auto_ptr<A> aa(new A);

        signal(SIGINT, catch_sigint);

        while (1) {
                if (got_sigint == 0) {
                        do_something();
                } else {
                        handle_sigint();
                        return -1;
                }
        }
}

The destructor of A will never be called. Of course, it is an artificial and somewhat contrived example, but a similar situation can actually happen; for example when your code is called by another code which handles SIGINT and which you have no control over at all (concrete example: mex extensions in matlab). It is the same reason why finally in python does not guarantee execution of something. Gc can help you in this case.

Other idioms do not play well with this: in any non trivial program, you will need stateful objects (I am using the word object in a very broad sense here, it can be any construction allowed by the language); if you need to control the state outside one function, you can't easily do that with RAII (which is why RAII is not that helpful for asynchronous programming). OTOH, gc have a view of the whole memory of your process, that is it knows about all the objects it allocated, and can clean asynchronously.

It can also be much faster to use gc, for the same reasons: if you need to allocate/deallocate many objects (in particular small objects), gc will vastly outperform RAII, unless you write a custom allocator, since the gc can allocate/clean many objects in one pass. Some well known C++ projects use gc, even where performance matter (see for example Tim Sweenie about the use of gc in Unreal Tournament: http://lambda-the-ultimate.org/node/1277). GC basically increases throughput at the cost of latency.

Of course, there are cases where RAII is better than gc; in particular, the gc concept is mostly concerned with memory, and that's not the only ressource. Things like file, etc... can be well handled with RAII. Languages without memory handling like python or ruby do have something like RAII for those cases, BTW (with statement in python). RAII is very useful when you precisely need to control when the ressource is freed, and that's quite often the case for files or locks for example.

David Cournapeau
  • 78,318
  • 8
  • 63
  • 70
  • "gc has a view of your entire process" - 99.99 of which is NOT a pointer to resource X. That's why RAII is good; it statically limits the number of places where relevant pointers could hide. And RAII also lets you control memory mgmt directly - just assign NULL to a smart pointer. – MSalters Oct 23 '08 at 14:28
  • 1
    You misunderstand what I mean: the fact that the gc can view the whole process memory and is not scope limited means it can free memory asynchronously, and free several objects "at once" (in one pass). The fact that RAII statically limit the scope of the ressources is as much a problem as a feature. – David Cournapeau Oct 23 '08 at 14:38
  • 3
    It's closer to the truth to say that there are many cases RAII can handle that garbage collection cannot. GC concentrates on memory; RAII handles any kind of resource. And as far as I can tell, smart pointers eliminate your "fragility" argument. – Head Geek Oct 23 '08 at 15:50
  • 1
    If you wrap your resources in objects that can be collected then GC can handle resource management too. If you back this up with a disposable pattern then you can free resources you know you are finished with. – Luke Quinane Oct 23 '08 at 22:21
  • 1
    @Head geek: how does smart pointer handle signals, for example ? For memory, gc is much better than RAII in most cases. Almost every high level language uses gc, I think that's telling something. – David Cournapeau Oct 24 '08 at 10:28
  • @Quinane: depending on the ressources, you want deterministic freeing. Typically, for files (or locks; although I think RAII does not work taht well for threads either), you want to control exactly when you free the ressource. – David Cournapeau Oct 24 '08 at 10:30
  • I added an example using signal where neither smart pointer or RAII frees the ressource. – David Cournapeau Oct 25 '08 at 08:22
  • 7
    @cournape: I'm sorry, but that example seems pretty bogus. Calling exit() in a signal handler wouldn't allow garbage collection to clean anything up either. – Head Geek Oct 25 '08 at 13:54
  • And RAII works quite well for thread locks. I use it for that reason regularly; in fact, that's the situation that introduced me to RAII as a concept. – Head Geek Oct 25 '08 at 13:55
  • @head geek: the example shows that RAII can fail, and that's its only intent. Of course, you would never use this in real code. But not returning to the callee after sigint happens in real code: think about your code being called by other code you can't control at all and which handles sigint itself – David Cournapeau Oct 26 '08 at 04:14
  • 2
    RAII can help for thread lock, I agree, but is no panacea. I guess I am really concerned with the idea that RAII is a miracle solution, which magically prevents deadlock, memory leak, etc... It is definitely useful, but it is not a magic stick. – David Cournapeau Oct 26 '08 at 04:25
  • 2
    RAII has its flaws, without a doubt. But I don't see GC as solving them, it just swaps one set of flaws for another. – Head Geek Oct 26 '08 at 04:47
  • 1
    Yes, they have different flaws, that's called a trade-off :) But gc solves problems that RAII cannot solve (out of scope persistence, as in my example, assuming its does something else than exciting right away), it is a very useful tool. Now, I am not sure it would be that useful for C++. – David Cournapeau Oct 26 '08 at 12:26
  • 2
    hmm, your example is just as bad with GC, except with GC even if the object was cleaned up at exit, it still wouldn't get its finaliser called (as the finalisation thread runs the 2nd time its collected). – gbjbaanb Dec 28 '08 at 21:32
8

The motivating factor for GC support in C++ appears to be lambda programming, anonymous functions etc. It turns out that lambda libraries benefit from the ability to allocate memory without caring about cleanup. The benefit for ordinary developers would be simpler, more reliable and faster compiling lambda libraries.

GC also helps simulate infinite memory; the only reason you need to delete PODs is that you need to recycle memory. If you have either GC or infinite memory, there is no need to delete PODs anymore.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 2
    In other words, it's merely a crutch for inexperienced programmers? :-) – Head Geek Oct 23 '08 at 15:55
  • 8
    Only if you consider functional programming as something for inexperienced programmers, yes. gc is an extremely powerful tool, with a cost: like all powerful abstractions, it enables focusing on the problem at hand, but sometimes, it breaks and you have to go below the abstraction. – David Cournapeau Oct 25 '08 at 07:45
  • 2
    +1 for being the first correct answer I've seen so far. The technical term you're looking for is the "upwards funarg problem" and dates back almost half a century. http://dl.acm.org/citation.cfm?id=1093411 – J D Jun 17 '13 at 12:27
7

The committee isn't adding garbage-collection, they are adding a couple of features that allow garbage collection to be more safely implemented. Only time will tell whether they actually have any effect whatsoever on future compilers. The specific implementations could vary widely, but will most likely involve reachability-based collection, which could involve a slight hang, depending on how it's done.

One thing is, though, no standards-conformant garbage collector will be able to call destructors - only to silently reuse lost memory.

coppro
  • 14,338
  • 5
  • 58
  • 73
  • You're right, they aren't "adding garbage collection." I misread the article. – Head Geek Oct 23 '08 at 06:17
  • "Couple of features" is right, C++ won't **rely** on it and won't happen unless used (zero-cost abstraction). [Stroustrup said](http://stroustrup.com/hopl-almost-final.pdf) ( [linked from page in Wikipedia](https://en.wikipedia.org/wiki/Outline_of_C%2B%2B) ) said `My view can be summarized as "C++ is such a good garbage-collected language because it creates so little garbage that needs to be collected".` – maxpolk Feb 03 '16 at 17:44
7

What advantages could garbage collection offer an experienced C++ developer?

Not having to chase down resource leaks in your less-experienced colleagues' code.

JohnMcG
  • 8,709
  • 6
  • 42
  • 49
  • 3
    Resource leaks simply can't happen if you insist that everyone use RAII and smart pointers. – Head Geek Oct 23 '08 at 15:53
  • 4
    But establishing and enforcing those as rules has a cost, and just having them as guidelines does not mean they will always be followed. – JohnMcG Oct 24 '08 at 16:48
  • 4
    And it is quite easy to leak memory when using RAII and smart pointer anyway. For example, a signal handler which changes the code path and never returns to the callee: neither RAII or smart pointer will help you in that case. – David Cournapeau Oct 25 '08 at 08:12
  • Ressource Aquisition Is Initialization: http://en.wikipedia.org/wiki/Resource_acquisition_is_initialization – David Cournapeau Oct 26 '08 at 04:28
  • 2
    For what its worth, even a smart mark and sweep garbage collector like .NET's can't prevent all resource leaks. – FlySwat Oct 26 '08 at 05:05
  • I don't see GC as helping with resources. The biggest thing it helps with is allowing immutable-object references to be used as proxies for their content, without having to regard the content as a resource. – supercat Feb 27 '15 at 04:24
6

Garbage collection allows to postpone the decision about who owns an object.

C++ uses value semantics, so with RAII, indeed, objects are recollected when going out of scope. This is sometimes referred to as "immediate GC".

When your program starts using reference-semantics (through smart pointers etc...), the language does no longer support you, you're left to the wit of your smart pointer library.

The tricky thing about GC is deciding upon when an object is no longer needed.

xtofl
  • 40,723
  • 12
  • 105
  • 192
  • 2
    Smart pointers completely eliminate the need to decide who owns an object. – Head Geek Oct 23 '08 at 15:52
  • 7
    @Head Geek : Not exactly. If you have 2 objects, A and B, pointed through smart pointers, you're right. Now, if A points to B, too, and if B points to A, too, then you have a problem, and must decide who owns the object through the use of weak_ptr and/or shared_ptr. – paercebal Oct 27 '08 at 22:57
  • -1 You're perpetuating the myth that objects become unreachable after they go out of scope. – J D Jun 17 '13 at 12:25
  • @JonHarrop: I C++, _with value semantics_, they *are* unreachable when out of scope. I should have mentioned that. – xtofl Jun 21 '13 at 10:21
  • That is true but the converse (they are reachable if they are in scope) is not true in general. – J D Jun 22 '13 at 09:55
6

It's an all-to-common error to assume that because C++ does not have garbage collection baked into the language, you can't use garbage collection in C++ period. This is nonsense. I know of elite C++ programmers who use the Boehm collector as a matter of course in their work.

tragomaskhalos
  • 2,733
  • 2
  • 17
  • 10
  • Yes, I've seen several add-on garbage collection libraries. I just don't see why they're necessary or desirable in most cases. The answers to this question gave me a (very) few cases where having it might be desirable, which is why I asked. – Head Geek Oct 25 '08 at 13:59
5

Easier thread safety and scalability

There is one property of GC which may be very important in some scenarios. Assignment of pointer is naturally atomic on most platforms, while creating thread-safe reference counted ("smart") pointers is quite hard and introduces significant synchronization overhead. As a result, smart pointers are often told "not to scale well" on multi-core architecture.

Suma
  • 33,181
  • 16
  • 123
  • 191
  • 1
    That's a valid point, though not one I'd normally be worried about. When I do multithreaded programming, the threads rarely share their data structures. – Head Geek Jul 13 '09 at 22:53
  • 1
    Technically, reference counting scales better that other GC algorithms in the asymptote of infinite heap size. – J D Jun 17 '13 at 12:41
5

Garbage collection makes RCU lockless synchronization much easier to implement correctly and efficiently.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Actually, lots of lock-free and wait-free data structures and algorithms are much easier to implement with automatic memory management. But, arguably, that's pointless unless the allocator is lock-free or wait-free which it invariably isn't... – J D Jun 17 '13 at 12:30
  • @JonHarrop: Only if you assume a single allocator shared across all threads that becomes a bottleneck. Most high-performance allocation schemes, though they may have locking, use fine-grained locking. And as long as there's no allocation in the hot path of your lock-free/wait-free algorithm, it doesn't really matter that much anyway. – Ben Voigt Jun 17 '13 at 13:25
3

Garbage collection is really the basis for automatic resource management. And having GC changes the way you tackle problems in a way that is hard to quantify. For example when you are doing manual resource management you need to:

  • Consider when an item can be freed (are all modules/classes finished with it?)
  • Consider who's responsibility it is to free a resource when it is ready to be freed (which class/module should free this item?)

In the trivial case there is no complexity. E.g. you open a file at the start of a method and close it at the end. Or the caller must free this returned block of memory.

Things start to get complicated quickly when you have multiple modules that interact with a resource and it is not as clear who needs to clean up. The end result is that the whole approach to tackling a problem includes certain programming and design patterns which are a compromise.

In languages that have garbage collection you can use a disposable pattern where you can free resources you know you've finished with but if you fail to free them the GC is there to save the day.


Smart pointers which is actually a perfect example of the compromises I mentioned. Smart pointers can't save you from leaking cyclic data structures unless you have a backup mechanism. To avoid this problem you often compromise and avoid using a cyclic structure even though it may otherwise be the best fit.

Luke Quinane
  • 16,447
  • 13
  • 69
  • 88
  • 1
    The problem is that the disposable pattern won't save you in all cases. In C#, the diposable pattern is a pain to implement correctly (as the finalizer can be called multiple times by different threads, etc.), and in Java, the "disposable" pattern is a joke. – paercebal Oct 23 '08 at 06:53
  • 2
    And again, proper use of smart pointers eliminates both of the problems you mention. – Head Geek Oct 23 '08 at 15:54
  • 2
    Proper use of the Boost::weak_ptr can eliminate problems with cyclic data structures too. It requires a full understanding of how your code works, but you should really have that kind of understanding regardless. – Head Geek Oct 25 '08 at 14:01
  • 1
    @Head Geek: Sometimes, you just don't want to care about some part of your code, in the same way you just don't want to care how the std::string allocates/frees its internal string. You want the data to be there as long as you use it, whatever how, and cleaned away when not used anymore. – paercebal Oct 27 '08 at 23:01
  • 1
    Finally, it should be the responsibility of the class designer to decide how it is to be free'ed, than to have the user check the implementation/documentation of the class. – Arafangion Mar 10 '09 at 04:29
  • @HeadGeek "proper use of smart pointers". There is no such thing as "proper use" of smart pointers. Consider the problem of representing a mutable graph as a data structure and automatically reclaiming unreachable subgraphs. There is no "proper" way to solve that problem using smart pointers. – J D Jun 17 '13 at 12:46
2

using RAII with smart pointers eliminates the need for it, right?

Smart pointers can be used to implement reference counting in C++ which is a form of garbage collection (automatic memory management) but production GCs no longer use reference counting because it has some important deficiencies:

  1. Reference counting leaks cycles. Consider A↔B, both objects A and B refer to each other so they both have a reference count of 1 and neither is collected but they should both be reclaimed. Advanced algorithms like trial deletion solve this problem but add a lot of complexity. Using weak_ptr as a workaround is falling back to manual memory management.

  2. Naive reference counting is slow for several reasons. Firstly, it requires out-of-cache reference counts to be bumped often (see Boost's shared_ptr up to 10× slower than OCaml's garbage collection). Secondly, destructors injected at the end of scope can incur unnecessary-and-expensive virtual function calls and inhibit optimizations such as tail call elimination.

  3. Scope-based reference counting keeps floating garbage around as objects are not recycled until the end of scope whereas tracing GCs can reclaim them as soon as they become unreachable, e.g. can a local allocated before a loop be reclaimed during the loop?

What advantages could garbage collection offer an experienced C++ developer?

Productivity and reliability are the main benefits. For many applications, manual memory management requires significant programmer effort. By simulating an infinite-memory machine, garbage collection liberates the programmer from this burden which allows them to focus on problem solving and evades some important classes of bugs (dangling pointers, missing free, double free). Furthermore, garbage collection facilitates other forms of programming, e.g. by solving the upwards funarg problem (1970).

J D
  • 48,105
  • 13
  • 171
  • 274
  • I don’t think (3) is a problem. Regarding the loop: yes, the scope ends at the end of the loop so storage of variables declared inside the loop doesn’t accumulate. In practice I think scope-based lifetime management is eminently practical and works amazingly well in most contexts. And yes, it fails in some (upwards funargs, complex multi-threaded data sharing …). – Konrad Rudolph Jun 17 '13 at 19:45
  • No consideration of `unique_ptr` -> fail. Your example of vcalls at the end of scope is a boost::shared_ptr specific thing, it does not have to be done at all. Also, GCs keep *a lot more* garbage around than RAII. Not to mention that GC has some very serious downsides, like non-deterministic destruction of resources, not being able to be used with non-GC memory/resources, etc. Also, your post is misleading because it implies that RAII users face dangling pointers, memory leaks, and double deletes, which they don't. In short, you're bad and wrong. – Puppy Jun 17 '13 at 20:28
  • @DeadMG: "a boost::shared_ptr specific thing". No, that applies to virtual destructors. "GCs keep a lot more garbage around than RAII". I measured the memory requirements of a vector graphics engine written in C++ and OCaml and found that the C++ required 5x more memory than the garbage collected OCaml because of this problem with scope-based collection. "RAII users face dangling pointers, memory leaks, and double deletes, which they don't". Using `unique_ptr` when there should be two owners can leave dangling pointers. Ref counting leaks. Thread unsafe `shared_ptr` can double delete. – J D Jun 17 '13 at 21:29
  • @JonHarrop: `shared_ptr` does not imply or require a virtual destructor. Using type erasure is a choice of user. Your memory anecdote is a logical fallacy (oddly, anecdote). It's trivially obvious that a garbage collector will always have more garbage around than a system that frees it the moment it's not referencable. There are no meaningful non-thread-safe `shared_ptr` implementations, `unique_ptr`'s interface doesn't permit double ownership unless you really try hard, and honestly, ref cycles never come up. – Puppy Jun 17 '13 at 22:02
  • 1
    "It's trivially obvious that a garbage collector will always have more garbage around than a system that frees it the moment it's not referencable". You're perpetuating a common memory management myth. Scope-based reference counting does not free at the earliest possible point. "There are no meaningful non-thread-safe shared_ptr implementations". See `BOOST_SP_DISABLE_THREADS`. "ref cycles never come up". Because you're making sacrifices in order to accommodate poor memory management strategies. Cycles are ubiquitous on the JVM and .NET. – J D Jun 18 '13 at 09:50
2

I, too, have doubts that C++ commitee is adding a full-fledged garbage collection to the standard.

But I would say that the main reason for adding/having garbage collection in modern language is that there are too few good reasons against garbage collection. Since eighties there were several huge advances in the field of memory management and garbage collection and I believe there are even garbage collection strategies that could give you soft-real-time-like guarantees (like, "GC won't take more than .... in the worst case").

ADEpt
  • 5,504
  • 1
  • 25
  • 32
  • 4
    The real time argument is moot anyway, because malloc/free do not have worst case guarantee either. – David Cournapeau Oct 25 '08 at 08:25
  • I think you're making mutually exclusive assumptions. There are few arguments against state-of-the-art GCs but they require the compiled code and GC to work in perfect harmony which is virtually impossible with a language like C++. – J D Jun 17 '13 at 12:43
2

In a framework that supports GC, a reference to an immutable object such as a string may be passed around in the same way as a primitive. Consider the class (C# or Java):

public class MaximumItemFinder
{
  String maxItemName = "";
  int maxItemValue = -2147483647 - 1;

  public void AddAnother(int itemValue, String itemName)
  {
    if (itemValue >= maxItemValue)
    {
      maxItemValue = itemValue;
      maxItemName = itemName;
    }
  }
  public String getMaxItemName() { return maxItemName; }
  public int getMaxItemValue() { return maxItemValue; }
}

Note that this code never has to do anything with the contents of any of the strings, and can simply treat them as primitives. A statement like maxItemName = itemName; will likely generate two instructions: a register load followed by a register store. The MaximumItemFinder will have no way of knowing whether callers of AddAnother are going to retain any reference to the passed-in strings, and callers will have no way of knowing how long MaximumItemFinder will retain references to them. Callers of getMaxItemName will have no way of knowing if and when MaximumItemFinder and the original supplier of the returned string have abandoned all references to it. Because code can simply pass string references around like primitive values, however, none of those things matter.

Note also that while the class above would not be thread-safe in the presence of simultaneous calls to AddAnother, any call to GetMaxItemName would be guaranteed to return a valid reference to either an empty string or one of the strings that had been passed to AddAnother. Thread synchronization would be required if one wanted to ensure any relationship between the maximum-item name and its value, but memory safety is assured even in its absence.

I don't think there's any way to write a method like the above in C++ which would uphold memory safety in the presence of arbitrary multi-threaded usage without either using thread synchronization or else requiring that every string variable have its own copy of its contents, held in its own storage space, which may not be released or relocated during the lifetime of the variable in question. It would certainly not be possible to define a string-reference type which could be defined, assigned, and passed around as cheaply as an int.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • I like the example. BTW, are you sure the assignment of references is atomic? (Note: After a bit of research, it seems so in C#, and I assume it is also in Java: For your example to work in C++, one would have to make the assignment of pointers atomic, too, either in the language - I don't think so - or by using std::atomic) – paercebal Feb 05 '15 at 09:55
  • @paercebal: Assignment of references in Java and .NET is just as atomic as an assignment of `int` [i.e. `a=b` performs an atomic read of `a` and an atomic write of `b`, though the operation as a whole may not be atomic]. A key requirement, which C++ pointers cannot satisfy, is that the GC must know about the register that `b` is read into during the assignment (if the thread performing the assignment gets swapped out for awhile during which `b` gets overwritten and a GC cycle becomes necessary, that thread's register may be the only reference to `b` anywhere in the universe). – supercat Feb 05 '15 at 16:35
  • @paercebal: The only way I can see code like the above being practical and efficient in C++ with objects too big to copy efficiently would be if the the GCreference class had some quick-to-access per-thread storage; in that scenario, `a=b` could translate into `currentThreadTempRef=b.data; a.data=currentThreadTempRef; currentThreadTempRef=null;`, and if the GC were to fire at any point in that process it could look through every threads' `currentThreadTempRef` and know that the objects identified thereby must be pinned. – supercat Feb 05 '15 at 16:50
  • That assumes that assignment itself is atomic. For example, let's say one thread do `b = c ;` while another, at the very same time does `a = b ;`, b being a raw pointer shared by the two threads. At that moment b could be half-written by thread 1, fully read by thread 2, and last half-written by thread 1 (I don't assume we automatically have atomic assignment for pointers in standard, portable C++). This means thread 2 has an incorrect pointer value. Am I correct? – paercebal Feb 06 '15 at 16:56
  • @paercebal: You are correct that in addition to providing a form of thread-local storage, the platform would also have to guarantee that a pointer write followed by a read would always see either the old or new value; additionally, if one wants to use "ordinary" stores, one would have to have a means by which the GC could force other threads to flush their caches, and add an `if (gcBusy) gcWait();` check to the reference-assignment process. If one assumes the GC has those powers, requiring non-sliceable pointer assignments is minor by comparison. – supercat Feb 06 '15 at 17:09
  • @paercebal: In any case, my main point is that the method above represents a pattern which is commonplace, efficient, and 100% memory-safe in Java and .NET, but which would be very hard to support in memory-safe fashion in C++, even if an implementation has efficient thread-local storage available [a concept I wish was widely supported sufficiently efficiently that time-critical code wouldn't have to avoid it]. – supercat Feb 06 '15 at 17:14
2

Garbage Collection Can Make Leaks Your Worst Nightmare

Full-fledged GC that handles things like cyclic references would be somewhat of an upgrade over a ref-counted shared_ptr. I would somewhat welcome it in C++, but not at the language level.

One of the beauties about C++ is that it doesn't force garbage collection on you.

I want to correct a common misconception: a garbage collection myth that it somehow eliminates leaks. From my experience, the worst nightmares of debugging code written by others and trying to spot the most expensive logical leaks involved garbage collection with languages like embedded Python through a resource-intensive host application.

When talking about subjects like GC, there's theory and then there's practice. In theory it's wonderful and prevents leaks. Yet at the theoretical level, so is every language wonderful and leak-free since in theory, everyone would write perfectly correct code and test every single possible case where a single piece of code could go wrong.

Garbage collection combined with less-than-ideal team collaboration caused the worst, hardest-to-debug leaks in our case.

The problem still has to do with ownership of resources. You have to make clear design decisions here when persistent objects are involved, and garbage collection makes it all too easy to think that you don't.

Given some resource, R, in a team environment where the developers aren't constantly communicating and reviewing each other's code carefully at alll times (something a little too common in my experience), it becomes quite easy for developer A to store a handle to that resource. Developer B does as well, perhaps in an obscure way that indirectly adds R to some data structure. So does C. In a garbage-collected system, this has created 3 owners of R.

Because developer A was the one that created the resource originally and thinks he's the owner of it, he remembers to release the reference to R when the user indicates that he no longer wants to use it. After all, if he fails to do so, nothing would happen and it would be obvious from testing that the user-end removal logic did nothing. So he remembers to release it, as any reasonably competent developer would do. This triggers an event for which B handles it and also remembers to release the reference to R.

However, C forgets. He's not one of the stronger developers on the team: a somewhat fresh recruit who has only worked in the system for a year. Or maybe he's not even on the team, just a popular third party developer writing plugins for our product that many users add to the software. With garbage collection, this is when we get those silent logical resource leaks. They're the worst kind: they do not necessarily manifest in the user-visible side of the software as an obvious bug besides the fact that over durations of running the program, the memory usage just continues to rise and rise for some mysterious purpose. Trying to narrow down these issues with a debugger can be about as fun as debugging a time-sensitive race condition.

Without garbage collection, developer C would have created a dangling pointer. He may try to access it at some point and cause the software to crash. Now that's a testing/user-visible bug. C gets embarrassed a bit and corrects his bug. In the GC scenario, just trying to figure out where the system is leaking may be so difficult that some of the leaks are never corrected. These are not valgrind-type physical leaks that can be detected easily and pinpointed to a specific line of code.

With garbage collection, developer C has created a very mysterious leak. His code may continue to access R which is now just some invisible entity in the software, irrelevant to the user at this point, but still in a valid state. And as C's code creates more leaks, he's creating more hidden processing on irrelevant resources, and the software is not only leaking memory but also getting slower and slower each time.

So garbage collection does not necessarily mitigate logical resource leaks. It can, in less than ideal scenarios, make leaks far easier to silently go unnoticed and remain in the software. The developers might get so frustrated trying to trace down their GC logical leaks that they simply tell their users to restart the software periodically as a workaround. It does eliminate dangling pointers, and in a safety-obsessed software where crashing is completely unacceptable under any scenario, then I would prefer GC. But I'm often working in less safety-critical but resource-intensive, performance-critical products where a crash that can be fixed promptly is preferable to a really obscure and mysterious silent bug, and resource leaks are not trivial bugs there.

In both of these cases, we're talking about persistent objects not residing on the stack, like a scene graph in a 3D software or the video clips available in a compositor or the enemies in a game world. When resources tie their lifetimes to the stack, both C++ and any other GC language tend to make it trivial to manage resources properly. The real difficulty lies in persistent resources referencing other resources.

In C or C++, you can have dangling pointers and crashes resulting from segfaults if you fail to clearly designate who owns a resource and when handles to them should be released (ex: set to null in response to an event). Yet in GC, that loud and obnoxious but often easy-to-spot crash is exchanged for a silent resource leak that may never be detected.