I'm running R on linux (kubuntu trusty). I have a csv file that's nearly 400MB, and contains mostly numeric values:
$ ls -lah combined_df.csv
-rw-rw-r-- 1 naught101 naught101 397M Jun 10 15:25 combined_df.csv
I start R, and df <- read.csv('combined_df.csv')
(I get a 1246536x25 dataframe, 3 int columns, 3 logi, 1 factor, and 18 numeric) and then use the script from here to check memory usage:
R> .ls.objects()
Type Size Rows Columns
df data.frame 231.4 1246536 25
Bit odd that it's reporting less memory, but I guess that's just because CSV isn't an efficient storage method for numeric data.
But when I check the system memory usage, top
says that R is using 20% of my available 8GB of RAM. And ps
reports similar:
$ ps aux|grep R
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
naught1+ 32364 5.6 20.4 1738664 1656184 pts/1 S+ 09:47 2:42 /usr/lib/R/bin/exec/R
1.7Gb of RAM for a 379MB data set. That seems excessive. I know that ps
isn't necessarily an accurate way of measuring memory usage, but surely it isn't out by a factor of 5?! Why does R use so much memory?
Also, R seems to report something similar in gc()
's output:
R> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 497414 26.6 9091084 485.6 13354239 713.2
Vcells 36995093 282.3 103130536 786.9 128783476 982.6