91

I am periodically cleaning the memory in R using a call to rm(list=ls()).
Do I need to call the garbage collector gc() after that?

What is the difference between these 2 functions? Does gc() call rm() for certain variables?

Seb
  • 5,417
  • 7
  • 31
  • 50
RockScience
  • 17,932
  • 26
  • 89
  • 125

3 Answers3

121

First, it is important to note that the two are very different in that gc does not delete any variables that you are still using- it only frees up the memory for ones that you no longer have access to (whether removed using rm() or, say, created in a function that has since returned). Running gc() will never make you lose variables.

The question of whether you should call gc() after calling rm(), though, is a good one. The documentation for gc helpfully notes:

A call of gc causes a garbage collection to take place. This will also take place automatically without user intervention, and the primary purpose of calling gc is for the report on memory usage.

However, it can be useful to call gc after a large object has been removed, as this may prompt R to return memory to the operating system.

So the answer is that it can be good to call gc() (and at the very least, can't hurt), even though it would likely be triggered anyway (if not right away, then soon).

Community
  • 1
  • 1
David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • Thank you for your answer. Generally speaking, is R automatic garbage collection considered as good? (as robust as java's one for instance) – RockScience Jan 11 '12 at 03:51
  • That's a difficult question to answer, I'm not sure. [This question](http://stackoverflow.com/questions/1467201/forcing-garbage-collection-to-run-in-r-with-the-gc-command) is useful. – David Robinson Jan 11 '12 at 04:52
  • 7
    Generally you should not have to call gc, and it's unlikely to make much difference if you do. – hadley Jan 11 '12 at 13:20
  • 24
    @hadley That doesn’t align at all with my experience. On the contrary, R often causes my operating system to swap even after large objects (~ a few hundred MiBs) are no longer available. Manually calling `gc()`, however, avoids this. Using available memory is OK, unnecessarily swapping really is not, since it negatively impacts the usability of the OS. State of the art GCs handle this much better. – Konrad Rudolph Oct 29 '15 at 15:54
  • 6
    Seconding @KonradRudolph's comment - in some recent work I've been noticing enormous amounts of memory consumed by a function's local variables. The memory is not freed when the variables go out of scope, as it would be in other languages. I had to call `gc()`. – Paul Aug 20 '17 at 12:37
  • 2
    For those happening upon this question today and subsequently reading this answer, I just removed some variables from my environment - which took up ~15GB in memory - yet R was still struggling to store any further variables despite having removed the large ones that I no longer needed. I then ran `gc()`, waited a minute, and then discovered that all of the memory had been reallocated to my session, thus allowing me to save variables once again. I'd like to reiterate what the `gc` documentation states: `...it can be useful to call gc after a large object has been removed...` – Mus May 26 '21 at 09:48
2

Re ThankGoat's comment on gc penalty, while this is true, one could of course decide to call gc every N iterations in a loop (where N can be parameterised in a number of ways). For loops where number of iterations is large, but resource usage within a given iteration is more modest, it may well not be necessary to do GC each and every iteration in order to regain desired performance.

Of course, if you're looping with a very large number of very high usage iterations, it's a different story, but at that stage it may well the case that the code simply needs to be vectorised and / or maybe even written in another language.

Pascoe
  • 167
  • 9
1

Personally I like to include the gc() in loops to free up some RAM when the loops start filling up the available space. Something like

for(i in 1:1000){
res[[i]] = some operation
gc()
}
Gabriel123
  • 426
  • 5
  • 11
  • 27
    Please be aware to calling gc() comes with a hefty performance penalty in the order of 100ms per call. So in this case your code will run about 100 seconds longer than necessary :) – ThankGoat Nov 24 '17 at 08:39