I'm doing some calculations in R and need to precisley predict the memory usage before hand.
One numeric number occupies 8 bytes:
a <- data.frame(a=rnorm(10000000))
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 218314 11.7 460000 24.6 350000 18.7
Vcells 10402305 79.4 15379586 117.4 10408785 79.5
10,000,000 * 8 bytes does equal 80Mb
However if I then convert this dataframe to a numeric, the memory usage is much higher, the maximum memory usage during the process of conversion to numeric is also extremely high.
a <- sapply(a, as.numeric)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 218918 11.7 9601876 512.8 10219029 545.8
Vcells 14532255 110.9 61311656 467.8 73533764 561.1
Why is the the size (memory) of the data frame higher post conversion to a numeric? When the output of rnorm(.) is numeric already, we can check this with class(rnorm(1)).
How can I predict how much memory sapply(a, as.numeric) will use for the process of conversion?