0

I have a data.frame called my.data with 2,000 columns and object.size(my.data)

returns around 450 MB.

I removed unnecessary columns by keeping only the ones (around 300 columns) I need

my.data <- my.data[,vars.keep]
gc()

Then I checked object.size(my.data) again and it's still 450 MB. How could I get back the memory in a efficient way?


Just want to add some more info, Maurits is right. After removing unnecessary columns, object size decreased dramatically, but rsession still uses same memory. Why doesn't rsession release the memory...?

format(object.size(transformed_data), units = "Mb");
[1] "40.5 Mb"
transformed_data <- transformed_data[,vars.keep]
format(object.size(transformed_data), units = "Mb");
[1] "5.3 Mb"

1 Answers1

2

That doesn't sound right. Can you double-check your code, and re-run your code within a fresh R instance.

This is what I get for a sample dataframe

df <- as.data.frame(matrix(1e6, ncol = 2000, nrow = 1000));
format(object.size(df), units = "Mb");
# [1] "15.5 Mb"

# Select only the first 100 columns
df <- df[, 1:100];
format(object.size(df), units = "Mb");
# [1] "0.8 Mb"

As to running gc: According to this post, running the garbage collector "can be good [...] (and at the very least, can't hurt), even though it would likely be triggered anyway (if not right away, then soon)." On the other hand, Hadley Wickham comments that "you should not have to call gc, and it's unlikely to make much difference if you do."

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thanks Maurits, let me check and get back to you – potato.tickly Oct 23 '17 at 21:29
  • Hi Maurits, yes you are right. After removing columns, the object size decreased dramatically. But in windows resource manager, the memory usage of rsession.exe doesn't change. It keeps the number. So this still doesn't help because I would like to release some memory. Is this because rsession keeps the memory released and use it later when some other object created? – potato.tickly Oct 30 '17 at 14:33
  • I think this is probably an unrelated issue. Memory management of R itself will be OS dependent. I'm not a Windows user, but on Linux some libraries allocate memory in chunks, and can only release memory once that whole chunk has been free'ed. In general, your OS will release memory if it can. It might be worth monitoring your memory usage, using `shell('systeminfo | findstr Memory')` before and after resizing an R object. Also have a look [here](https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14611) for an extended discussion on how R releases memory, dating back from 2011. – Maurits Evers Oct 31 '17 at 09:21
  • Thanks so much for detailed suggestion. – potato.tickly Oct 31 '17 at 14:11