0

I'm doing some calculations in R and need to precisley predict the memory usage before hand.

One numeric number occupies 8 bytes:

a <- data.frame(a=rnorm(10000000))
gc()
           used (Mb) gc trigger  (Mb) max used (Mb)
Ncells   218314 11.7     460000  24.6   350000 18.7
Vcells 10402305 79.4   15379586 117.4 10408785 79.5

10,000,000 * 8 bytes does equal 80Mb

However if I then convert this dataframe to a numeric, the memory usage is much higher, the maximum memory usage during the process of conversion to numeric is also extremely high.

a <- sapply(a, as.numeric)
gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   218918  11.7    9601876 512.8 10219029 545.8
Vcells 14532255 110.9   61311656 467.8 73533764 561.1
  1. Why is the the size (memory) of the data frame higher post conversion to a numeric? When the output of rnorm(.) is numeric already, we can check this with class(rnorm(1)).

  2. How can I predict how much memory sapply(a, as.numeric) will use for the process of conversion?

Parsa
  • 3,054
  • 3
  • 19
  • 35
  • 3
    using `gdata` and `object.size` I see that they have very comparable sizes (difference of 208 bytes always), so it's not about the size of the objects but only the conversion process, your first question is ambiguous in this respect. `gc()` doesn't measure object size, it just frees allocated memory if you need it outside of R as far as I understand. – moodymudskipper Aug 09 '17 at 16:20
  • @Moody_Mudskipper 79.4 vs 110.9 Mb is quite a difference? From the R docs: "A call of gc causes a garbage collection to take place. This will also take place automatically without user intervention, and the primary purpose of calling gc is for the report on memory usage." – Parsa Aug 09 '17 at 16:29
  • yes it's a big difference and your question is interesting, but it's not a question about object size. `library(gdata);a <- data.frame(a=rnorm(10000000));b <- sapply(a, as.numeric);object.size(a);object.size(b)` – moodymudskipper Aug 09 '17 at 16:32
  • 2
    @Moody This seems to cover it https://stackoverflow.com/q/14580233/ "What GC does is [yada yada], this does not mean that this memory is released to the OS." Seems like a dupe. – Frank Aug 09 '17 at 16:40
  • Hmm interesting, so I guess the second question is why does sapply(a, as.numeric) require so much additional memory? And how could the memory usage of this function be calculated beforehand? – Parsa Aug 09 '17 at 17:30
  • Its odd that a 79.5Mb dataframe temporarily takes up over half a gigabyte during the sapply(df, as.numeric) function? – Parsa Aug 09 '17 at 17:30

0 Answers0