50

I use some variables, but when it is used, I never need it again, so I need to remove it and release the memory, but the function rm() seems not help:

memory.size()
30.69
tmp=matrix(rnorm(6e5*20),6e5,20)
memory.size()
207.64
rm(tmp)
memory.size()
207.64

Does it mean that I remove the tmp but the memory is not released?

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
PepsiCo
  • 1,399
  • 4
  • 13
  • 18
  • 8
    What happens after `gc()` ? – Ben Mar 22 '13 at 01:56
  • 2
    Great! gc() is what I need! – PepsiCo Mar 22 '13 at 02:00
  • 1
    I am sorry,I get another question, as I running the program, the memory used larger and larger, so is it necessary for me to add some gc() during my program? I mean add some gc() during my codes, such as #codes# gc() #codes# gc() #codes#, would it be helpful? – PepsiCo Mar 22 '13 at 03:26
  • No it's not needed, gc is called by the background process at specific intervals. The best way to avoid memory issues is to break the code into a lot of smaller functions and only return the needed elements everything else within the function should be disposed of automaticcally next time the R process engages garbage collection. – Hansi Mar 22 '13 at 13:23
  • 2
    @Hansi that is not always true, see the discussion here http://www.stackoverflow.com/questions/1467201/forcing-garbage-collection-to-run-in-r-with-the-gc-command – Ben Mar 22 '13 at 14:43
  • possible duplicate of [Increasing the memory available to R processes](http://stackoverflow.com/questions/1395229/increasing-the-memory-available-to-r-processes) – Waldir Leoncio Dec 09 '13 at 14:12

1 Answers1

55

I use gc() to free up RAM between operations. Below is example of how I use it in a loop, but see here for a more detailed discussion of gc() and here for more on memory management during an R session.

# load library
library(topicmodels)

# get data
data("AssociatedPress"))

# set number of topics to start with
k <- 20

# set model options
control_LDA_VEM <-
list(estimate.alpha = TRUE, alpha = 50/k, estimate.beta = TRUE,
verbose = 0, prefix = tempfile(), save = 0, keep = 0,
seed = as.integer(100), nstart = 1, best = TRUE,
var = list(iter.max = 10, tol = 10^-6),
em = list(iter.max = 10, tol = 10^-4),
initialize = "random")


# create the sequence that stores the number of topics to 
# iterate over
sequ <- seq(20, 300, by = 20)

# basic loop to iterate over different topic numbers with gc
# after each run to empty out RAM
lda <- vector(mode='list', length = length(sequ))
for(k in sequ) {
  lda[[k]] <- LDA(AssociatedPress[1:20,], k, method= "VEM", control = control_LDA_VEM)
  gc() # here's where I put the garbage collection to free up memory before the next round of the loop
}

# convert list output to dataframe (suggestions for a simpler method are welcome!)
best.model.logLik <- data.frame(logLik = as.matrix(lapply(lda[sequ], logLik)), ntopic = sequ)

# plot
with(best.model.logLik, plot(ntopic, logLik, type = 'l', xlab="Number of topics", ylab="Log likelihood"))

enter image description here

# print ordered dataframe to see which number of topics has the highest log likelihood
(best.model.logLik.sort <- best.model.logLik[order(-as.numeric(best.model.logLik$logLik)), ]) 
    logLik       ntopic
2  -17904.12     40
3  -18105.48     60
1  -18181.84     20
4   -18569.7     80
5  -19736.94    100
6   -21919.6    120
7  -23785.08    140
8  -24914.23    160
9  -25493.76    180
10 -25837.64    200
11 -25964.23    220
12 -26061.01    240
13 -26117.92    260
14 -26149.44    280
15 -26168.91    300
Community
  • 1
  • 1
Ben
  • 41,615
  • 18
  • 132
  • 227
  • 4
    This is a classic case of the second circle of R hell. Never grow vectors (or objects) within a loop. `lda <- vector(mode='list', length = seq)` (`seq` is a function name in R, good idea to avoid these as object names, as it may lead to confusion). You probably want to assign with `[[<-` not `[<-` if the `k` is always integer of length 1. – mnel Mar 22 '13 at 04:56
  • @mnel thank you for your instructive and detailed comment! I've edited my answer accordingly. – Ben Mar 22 '13 at 05:12
  • I should add that starting at 20 topics for 20 documents is a bit silly. I should have started at with less topics than documents eg. `sequ <- seq(5, 200, by = 5)` – Ben Mar 22 '13 at 05:56
  • 3
    In my computer gc() releases some of the memory but it's not perfect. If I load a large object do something with it, delete it and use gc() and I don't get the same free memory that at the beginning. The more things I do the more memory I'm unable to recover. At the end, after many operations with big objetcs I can run out of memory. I'm in Windows 10 x64 and I use 16GB of RAM. – skan Oct 05 '16 at 13:52
  • This question was very useful to me. I think the last comment from @skan deserves a replica. skan, if you already have an answer, please post here. – Jecogeo Jan 15 '18 at 16:03
  • @Jecogeo unfortunatelly I don't have a proper answer. I think doesn't manage the memory properly. That's why many people move to other languages. – skan Jan 16 '18 at 19:07