19

Most of the modern languages have built in garbage collection (GC). e.g. Java, .NET languages, Ruby, etc. Indeed GC simplifies application development in many ways.

I am interested to know the limitations / disadvantages of writing applications in GCed languages. Assuming GC implemenation is optimal, I am just wondering we may be limited by GC to take some optimization decisions.

casperOne
  • 73,706
  • 19
  • 184
  • 253
rpattabi
  • 9,984
  • 5
  • 45
  • 53
  • 2
    http://stackoverflow.com/questions/1424660/garbage-collection-vs-non-garbage-collection-programming-languages – Bertrand Marron Jul 09 '10 at 17:16
  • @tusbar Though that question has a generic title, it only asks about the mistakes developers would make if they are programming in no GC languages. – rpattabi Jul 09 '10 at 17:23

8 Answers8

25

The main disadvantages to using a garbage collector, in my opinion, are:

  1. Non-deterministic cleanup of resources. Sometimes, it is handy to say "I'm done with this, and I want it cleaned NOW". With a GC, this typically means forcing the GC to cleanup everything, or just wait until it's ready - both of which take away some control from you as a developer.

  2. Potential performance issues which arise from non-deterministic operation of the GC. When the GC collects, it's common to see (small) hangs, etc. This can be particularly problematic for things such as real-time simulations or games.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • 17
    +1 *Adds GC to the list of reasons he dies in games* – corsiKa Jul 09 '10 at 17:24
  • @glowcoder: Isn't it great? I love being able to have something to blame for that... ;) – Reed Copsey Jul 09 '10 at 17:24
  • Is Non-deterministic cleanup a common GC problem? or specific to particular GC implementations? – rpattabi Jul 09 '10 at 17:52
  • @ragu.pattabi: GC's, by their nature, tend to nearly always have non-deterministic cleanup. That's really a major point to the GC in the first place - you don't worry about when or how memory is freed, and leave it to the collector. – Reed Copsey Jul 09 '10 at 17:53
  • "With a GC, this typically means..." you don't use the GC to do that. – J D Aug 17 '12 at 19:05
14

Take it from a C programmer ... it is about cost/benefit and appropriate use

The garbage collection algorithms such as tri-color/mark-and-sweep there is often significant latency between a resource being 'lost' and the physical resource being freed. In some runtimes the GC will actually pause execution of the program to perform garbage collection.

Being a long time C programmer, I can tell you:

a) Manual free() garbage collection is hard -- This is because there is usually a greater error rate in human placement of free() calls than GC algorithms.

b) Manual free() garbage collection costs time -- Does the time spent debugging outweigh the millisecond pauses of a GC? It may be beneficial to use garbage collection if you are writing a game than say an embedded kernel.

But, when you can't afford the runtime disadvantage (right resources, real-time constraints) then performing manual resource allocation is probably better. It may take time but can be 100% efficient.

Try and imagine an OS kernel written in Java? or on the .NET runtime with GC ... Just look at how much memory the JVM accumulates when running simple programs. I am aware that projects exist like this ... they just make me feel a bit sick.

Just bear in mind, my linux box does much the same things today with 3GB RAM than it did when it had 512MB ram years ago. The only difference is I have mono/jvm/firefox etc running. The business case for GC is clear, but it still makes me uncomfortable alot of the time.

Good books:

Dragon book (recent edition), Modern Compiler Implementation in C

Aiden Bell
  • 28,212
  • 4
  • 75
  • 119
  • 1
    I agree with this - but... "performing manual resource allocation" in a way that's "100% efficient" often leads to basically writing your own "mini-GC". It's a LOT of effort to do well, and often leads to other problems. (For example, trying to prevent memory fragmentation is very challenging in C...) – Reed Copsey Jul 09 '10 at 17:56
  • @Aiden Bell: Not sure if this is radical. But why not when memory is not problem in near future. Improvement to OS kernel with GC could benefit all the apps run on the OS, where improvements to apps can only help them individually. GC could learn about the apps it runs (AI?) and behave in an optimal way. – rpattabi Jul 09 '10 at 17:56
  • 1
    @Reed Copsey -- Tell me about it. But, a well defined program that is written as a set of 'spinning' algorithms can have perfect allocation without too much of a headache. – Aiden Bell Jul 09 '10 at 17:57
  • 1
    @ragu - It is all about weighing up the cost/benefit. It may be simpler for the programmer but the cost in an OS is greater than a basic business app. – Aiden Bell Jul 09 '10 at 17:59
  • 1
    @Aiden: I only mentioned this because I've had to write my own custom allocators with compaction. If you're application has a relatively flat memory usage pattern, it's pretty easy, and much more efficient, but if it's not, it's a royal pain in the .... – Reed Copsey Jul 09 '10 at 18:00
  • 1
    @ragu.pattabi: There is actually research in this area. Some serious, and a lot of hobbiest (like http://jos.sourceforge.net/) Most operating systems have their own memory management routines that do a lot of what a GC does for an executable already, though... – Reed Copsey Jul 09 '10 at 18:01
  • 1
    @Reed Copsey - I agree completely. I have rewritten some C apps because I have ended up passing context to functions and essentially taking an OOP approach ... when a C app gets that 'shape' of data I would usually go for Python and take a dive. I'm neither for or against either method ... each are suitable in given constraints. – Aiden Bell Jul 09 '10 at 18:02
  • 1
    @Reed, correct me if I am wrong, but mostly we write allocator to avoid fragmentation or to avoid copying the data on reallocation when size is not static (vector). in these cases, GC will be a big pain as well, if you are looking for performance. – Ramadheer Singh Jul 09 '10 at 18:03
  • @Gollum - I wouls day heap fragmentation is a seperate issue to GC. Garbage collection doesn't help matters if you don't have a good allocation scheme - but an allocation routine that works with the GC can be beneficial if it reuses freed blocks from the GC. – Aiden Bell Jul 09 '10 at 18:05
  • @Gollum: How is a GC a "big pain" in that context? – J D Aug 19 '10 at 13:07
7

For .NET, there are two disadvantages that I can see.

1) People assume that the GC knows best, but that's not always the case. If you make certain types of allocations, you can cause yourself to experience some really nasty program deaths without direct invokation of the GC.

2) Objects larger than 85k go onto the LOH, or Large Object Heap. That heap is currently NEVER compacted, so again, your program can experience out-of-memory exceptions when really the LOH is not compacted enough for you to make another allocation.

Both of these bugs are shown in code that I posted in this question:

How do I get .NET to garbage collect aggressively?

Community
  • 1
  • 1
mmr
  • 14,781
  • 29
  • 95
  • 145
  • Large Object Heap is interesting. Is it .NET specific thing? – rpattabi Jul 09 '10 at 17:32
  • 1
    @ragu.pattabi: Yes. Basically, in .NET, any single 85k allocation (ie: a large array of structures) will get allocated using a "traditional" style allocation, and not be compacted with the rest of the GC heap. – Reed Copsey Jul 09 '10 at 17:33
  • @Reed Copsey: That's interesting. I see, looking at specific GC implementations could point to limitations specific to them though most of the limitations are common. – rpattabi Jul 09 '10 at 17:50
  • +1 - I have encountered some very crippling issues with the LOH and .Net, esp when doing any kind of COM-Interoperability or building Windows service modules that run for days. I wish there was a solution for forcing the purge of the LOH in .Net. – James Jul 09 '10 at 18:15
  • I wonder why the LOH allocates things on small boundaries rather than 4K boundaries? Padding an 85K+ object to the next 4K would waste at most 20% of space, and the LOH could then be compacted using the page table. As it is, my basic philosophy tend to be "avoid allocating anything over 80K". – supercat Jul 09 '10 at 18:44
4

I am interested to know the limitations / disadvantages of writing applications in GCed languages. Assuming GC implemenation is optimal, I am just wondering we may be limited by GC to take some optimization decisions.

My belief is that automatic memory management imposes a glass ceiling on efficiency but I have no evidence to substantiate that. In particular, today's GC algorithms offer only high throughput or low latency but not both simultaneously. Production systems like .NET and the HotSpot JVM incur significant pauses precisely because they are optimized for throughput. Specialized GC algorithms like Staccato offer much lower latency but at the cost of much lower minimum mutator utilisation and, therefore, low throughput.

J D
  • 48,105
  • 13
  • 171
  • 274
2

If you are confident(good) about your memory management skills, there is no advantage.

The concept was introduced to minimize the time of development and due to lack of experts in programming who thoroughly understood memory.

Ramadheer Singh
  • 4,124
  • 4
  • 32
  • 44
  • 12
    "confident" is different from "good" ;) – Sean Edwards Jul 09 '10 at 17:17
  • 3
    There are MANY advantages to a GC, even if you are very skilled and confident in your memory management. This becomes especially true if you're using a compacting GC... – Reed Copsey Jul 09 '10 at 17:21
  • 2
    The answer is a bit smug IMO. It's like saying C was invented for people that do not thoroughly understand assembler. Garbage collection greatly simplifies programming and hence allows more efficient development. Full control over memory is needed in very few places, as are the theoretical performance gains. – Janick Bernet Jul 09 '10 at 17:23
  • 1
    *Garbage collection greatly simplifies programming* - isn't that what I wrote. And, I am never against GC, I just stated why it was introduced, you can differ, its just my opinion(how I see things). C definitely made it easier for people to program than assembly. – Ramadheer Singh Jul 09 '10 at 17:26
  • 2
    Memory leaks are generally caused by things missed, not from lack of expertise. Although *I* don't write bugs, I hear they happen to other people because of this reason. – jfsk3 Jul 09 '10 at 17:27
  • @jfsk3, not just memory leaks are the problem. fragmentation too, you need to be careful. – Ramadheer Singh Jul 09 '10 at 17:36
  • @Gollum: I appreciate the point of view. I noticed at stack overflow subjective answers are discouraged, unfortunately. I think stack overflow community is conscious to prevent slashdot style discussions. – rpattabi Jul 09 '10 at 18:16
  • -1: Memory safety of *all* managed code (not just your own) is an obvious counter example. – J D Aug 19 '10 at 13:10
  • @Jon, no offense. but can you tell me why people do not run it on embedded devices? I am just saying that it reduces the time of development by abstracting the memory details, but you pay a good price for that. if your code is free of any memory problems(let's assume) which one would you prefer it to be, managed or unmanaged? – Ramadheer Singh Aug 19 '10 at 16:40
  • @Gollum: "can you tell me why people do not run it on embedded devices?". People do run managed code on embedded devices. Embedded Java is very common, particularly in smart cards. One of my customers writes embedded C#/F# for Welch Allyn's stethoscopes. Look at the number of phone apps written in Java or the number of embedded telecoms systems written in Erlang, for example. – J D Aug 20 '10 at 08:29
  • @jon, thanks but I am still confused, wouldn't the GC knock down the performance ? my arguments are theoretical, I don't have that much experience that I can judge anything. so just clearing my doubts. – Ramadheer Singh Aug 21 '10 at 20:49
  • 1
    @Gollum: "wouldn't the GC knock down the performance?" Yes but it makes hard problems easier to solve and software complexity is increasingly more of a concern than performance as compute power continues to improve. – J D Aug 22 '10 at 10:04
  • 1
    @JonHarrop I agree after this much experience - I was naive to put software performance above software complexity. What can I say, we become wise only with age, specially not too smart folks like me. – Ramadheer Singh Apr 14 '16 at 23:00
1

The biggest problem when it comes to performance (especially on or real-time systems) is, that your program may experience some unexpected delays when GC kicks in. However, modern GC try to avoid this and can be tuned for real time purposes.

Another obvious thing is that you cannot manage your memory by yourself (for instance, allocate on numa local memory), which you may need to do when you implement low-level software.

Janick Bernet
  • 20,544
  • 2
  • 29
  • 55
  • Typically, if you have very low level (ie: numa local memory) requirements, you can drop into traditional allocation for that... – Reed Copsey Jul 09 '10 at 17:35
  • Contrary to what I thought, .NET has a flavor for embedded applications called .NET Micro Framework. It has GC. – rpattabi Jul 09 '10 at 17:39
  • @Reed Copsey: But then you have IMO already some hybrid system that allows both. I think most GC languages do not allow such thing. – Janick Bernet Jul 09 '10 at 17:39
1

It is almost impossible to make a non-GC memory manager work in a multi-CPU environment without requiring a lock to be acquired and released every time memory is allocated or freed. Each lock acquisition or release will require a CPU to coordinate its actions with other CPUs, and such coordination tends to be rather expensive. A garbage-collection-based system can allow many memory allocations to occur without requiring any locks or other inter-CPU coordination. This is a major advantage. The disadvantage is that many steps in garbage collection require that the CPU's coordinate their actions, and getting good performance generally requires that such steps be consolidated to a significant degree (there's not much benefit to eliminating the requirement of CPU coordination on each memory allocation if the CPUs have to coordinate before each step of garbage collection). Such consolidation will often cause all tasks in the system to pause for varying lengths of time during collection; in general, the longer the pauses one is willing to accept, the less total time will be needed for collection.

If processors were to return to a descriptor-based handle/pointer system (similar to what the 80286 used, though nowadays one wouldn't use 16-bit segments anymore), it would be possible for garbage collection to be done concurrently with other operations (if a handle was being used when the GC wanted to move it, the task using the handle would have to be frozen while the data was copied from its old address to its new one, but that shouldn't take long). Not sure if that will ever happen, though (Incidentally, if I had my druthers, an object reference would be 32 bits, and a pointer would be an object reference plus a 32-bit offset; I think it will be awhile before there's a need for over 2 billion objects, or for any object over 4 gigs. Despite Moore's Law, if an application would have over 2 billion objects, its performance would likely be improved by using fewer, larger, objects. If an application would need an object over 4 gigs, its performance would likely be improved by using more, smaller, objects.)

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Interesting point of view. By the way, 2 billion objects / 4 gb an object? By that time, may be there would be alternative to OO in main stream dev, I guess :-) – rpattabi Jul 09 '10 at 18:26
  • "...it would be possible for garbage collection to be done concurrently with other operations". Garbage collection usually is done concurrently with other operations. – J D Aug 14 '12 at 20:49
  • @JonHarrop: In a typical "concurrent" GC, much of the work of the GC can be done concurrently with other operations, but some of it can't. Further, attempting to have the GC operate concurrently with other operations increases GC overhead since it must keep track of which items may have been modified since the last time the GC examined them. Having hardware-assisted descriptors to help keep track of which objects have been modified could help on both counts. – supercat Aug 14 '12 at 21:39
  • 1
    @supercat Typical GCs are only "mostly concurrent" so they do incur short stop-the-world pauses, yes. However, some GCs are fully concurrent so it is not valid to cite this as a disadvantage of garbage collection in general. In particular, your assertion "when garbage collection occurs, everything else has to stop until it's complete" is not true for all GCs. – J D Aug 17 '12 at 18:56
0

Typically, garbage collection has certain disadvantages:

  • Garbage collection consumes computing resources in deciding what memory is to be freed, reconstructing facts that may have been known to the programmer. The penalty for the convenience of not annotating object lifetime manually in the source code is overhead, often leading to decreased or uneven performance. Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing.
  • The point when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments such as device drivers, in transaction processing, or in interactive programs.
  • Memory may leak despite the presence of a garbage collector, if references to unused objects are not themselves manually disposed of. This is described as a logical memory leak.[3] For example, recursive algorithms normally delay release of stack objects until after the final call has completed. Caching and memoizing, common optimization techniques, commonly lead to such logical leaks. The belief that garbage collection eliminates all leaks leads many programmers not to guard against creating such leaks.
  • In virtual memory environments typical of modern desktop computers, it can be difficult for the garbage collector to notice when collection is needed, resulting in large amounts of accumulated garbage, a long, disruptive collection phase, and other programs' data swapped out.
  • Perhaps the most significant problem is that programs that rely on garbage collectors often exhibit poor locality (interacting badly with cache and virtual memory systems), occupy more address space than the program actually uses at any one time, and touch otherwise idle pages. These may combine in a phenomenon called thrashing, in which a program spends more time copying data between various grades of storage than performing useful work. They may make it impossible for a programmer to reason about the performance effects of design choices, making performance tuning difficult. They can lead garbage-collecting programs to interfere with other programs competing for resources
Sedat Kapanoglu
  • 46,641
  • 25
  • 114
  • 148
gsoni
  • 1,126
  • 15
  • 25
  • 2
    Quite amazing that I missed to check wikipedia for my question. Thanks! http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Disadvantages – rpattabi Jul 10 '10 at 11:07
  • "Garbage collection consumes computing resources in deciding what memory is to be freed". It also saves resources by eliminating all calls to `free`. – J D Aug 17 '12 at 20:30
  • "Memory may leak despite the presence of a garbage collector". That isn't a disadvantage of garbage collection any more than the next programming error it doesn't save you from. – J D Aug 17 '12 at 20:31
  • "They may make it impossible for a programmer to reason about the performance effects of design choices". Not true. – J D Aug 17 '12 at 20:33