1

As the title suggests, I am trying to fully understand memory constraints with R because I have a project that is quickly growing in scale, and I am worried that memory constraints will soon become a major issue.

I am aware of object.size, and I get the following output when run on the largest item in my environment:

> object.size(raw.pbp.data)
457552240 bytes 

...so the largest item is ~457MB. I have also checked on my macbook pro's memory, in the About This Mac --> Storage, and it shows my Memory as 8 GB 1600 MHz DDR3, so I assume I have 8 GB to work with.

Obviously the 457MB dataframe is not the only object in my R environment, but I do not want to manually run object.size for every single object and add up the bytes to find the total size of memory used. Is there a better way to do this? A function that tells me the memory used in total by all objects in my RStudio Environment would be great. Does such a function exist?

Also, what happens when I get closer to 8GB - is my R script going to stop working? I'm anticipating my data is going to increase by a factor of 5 - 10x in the near future, which will probably bring the total memory used in the environment close-to, or even greater than, 8GB.

Lastly, if hitting 8GB of memory is going to hault my R script from running, what are my options? If I convert my dataframe into a datatable, could that reduce the size of the object overall?

Any help with this is greatly appreciated, thanks!!

Edit: saved as a .rda file, raw.pbp.data is only 32MB, so that makes me optimistic that there is a way to potentially reduce its size when loaded into R.

Canovice
  • 9,012
  • 22
  • 93
  • 211

1 Answers1

0

I am not aware of any functions, but this works. You could make a function out of this:

env <- eapply(environment(), object.size, USE.NAMES = FALSE)
sizes <- c()
for (i in 1:length(env)) {
  sizes[i] <- env[[i]][1]
}
sum(sizes)

Besides the obvious (running this on a server or buying more RAM), I've heard data.table is more efficient than data.frame. Try using it. The syntax is more concise too! I cannot recommend data.table enough.

Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76