As the title suggests, I am trying to fully understand memory constraints with R because I have a project that is quickly growing in scale, and I am worried that memory constraints will soon become a major issue.
I am aware of object.size
, and I get the following output when run on the largest item in my environment:
> object.size(raw.pbp.data)
457552240 bytes
...so the largest item is ~457MB. I have also checked on my macbook pro's memory, in the About This Mac --> Storage, and it shows my Memory as 8 GB 1600 MHz DDR3
, so I assume I have 8 GB to work with.
Obviously the 457MB dataframe is not the only object in my R environment, but I do not want to manually run object.size
for every single object and add up the bytes to find the total size of memory used. Is there a better way to do this? A function that tells me the memory used in total by all objects in my RStudio Environment would be great. Does such a function exist?
Also, what happens when I get closer to 8GB - is my R script going to stop working? I'm anticipating my data is going to increase by a factor of 5 - 10x in the near future, which will probably bring the total memory used in the environment close-to, or even greater than, 8GB.
Lastly, if hitting 8GB of memory is going to hault my R script from running, what are my options? If I convert my dataframe into a datatable, could that reduce the size of the object overall?
Any help with this is greatly appreciated, thanks!!
Edit: saved as a .rda
file, raw.pbp.data
is only 32MB, so that makes me optimistic that there is a way to potentially reduce its size when loaded into R.