Taken literally, I am not aware of any existing implementation of Map or Set that never produces any garbage on adding and removing a key.
In fact, the only way that it would even be technically possible (in Java, using the Map
and Set
APIs as defined) is if you were to place a strict upper bound on the number of entries. Practical Map and Set implementations need extra state proportional to the number of elements they hold. This state has to be stored somewhere, and when the current allocation is exceeded that storage needs to be expanded. In Java, that means that new nodes need to be allocated.
(OK, you could designed a data structure class that held onto old useless nodes for ever, and therefore never generated any collectable garbage ... but it is still generating garbage.)
So what can you do about this in practice ... to reduce the amount of garbage generated. Let's take HashMap
as an example:
Garbage is created when you remove an entry. This is unavoidable, unless you replace the hash chains with an implementation that never releases the nodes that represent the chain entries. (And that's a bad idea ... unless you can guarantee that the free node pool size will always be small. See below for why it is a bad idea.)
Garbage is created when the main hash array is resized. This can be avoided in a couple of ways:
You can give a 'capacity' argument in the HashMap constructor to set the size of the initial hash array large enough that you never need to resize it. (But that potentially wastes space ... especially if you can't accurately predict how big the HashMap
is going to grow.)
You can supply a ridiculous value for the 'load factor' argument to cause the HashMap to never resize itself. (But that results in a HashMap whose hash chains are unbounded, and you end up with O(N)
behaviour for lookup, insertion, deletion, etc.
In fact, creating garbage is not necessarily bad for performance. Indeed, hanging onto nodes so that the garbage collector doesn't collect them can actually be worse for performance.
The cost of a GC run (assuming a modern copying collector) is mostly in three areas:
- Finding nodes that are not garbage.
- Copying those non-garbage nodes to the "to-space".
- Updating references in other non-garbage nodes to point to objects in "to-space".
(If you are using a low-pause collector there are other costs too ... generally proportional to the amount of non-garbage.)
The only part of the GC's work that actually depends on the amount of garbage, is zeroing the memory that the garbage objects once occupied to make it ready for reuse. And this can be done with a single bzero
call for the entire "from-space" ... or using virtual memory tricks.
Suppose your application / data structure hangs onto nodes to avoid creating garbage. Now, when the GC runs, it has to do extra work to traverse all of those extra nodes, and copy them to "to-space", even though they contain no useful information. Furthermore, those nodes are using memory, which means that if the rest of the application generates garbage there will be less space to hold it, and the GC will need to run more often.
And if you've used weak/soft references to allow the GC to claw back nodes from your data structure, then that's even more work for the GC ... and space to represent those references.
Note: I'm not claiming that object pooling always makes performance worse, just that it often does, especially if the pool gets unexpectedly big.
And of course, that's why HashMap and similar general purpose data structure classes don't do any object pooling. If they did, they would perform significantly badly in situations where the programmer doesn't expect it ... and they would be genuinely broken, IMO.
Finally, there is an easy way to tune a HashMap so that an add immediately followed by a remove of the same key produces no garbage (guaranteed). Wrap it in a Map class that caches the last entry "added", and only does the put
on the real HashMap
when the next entry is added. Of course, this is NOT a general purpose solution, but it does address the use case of your earlier question.