0

I'm trying to run a function to multiply a dataset variable value by a scalar from another weight dataset. The choice of scalar depends on the year and location, which corresponds to rows and columns in the weight dataset. R, however, cannot return the object due to its size, which surprised me since it is only 15.6 Mb. There are >2 million records. When running it with example variables, a much higher size is shown (about 3 Tb).

new.value.fn <- function(variable, year, location, weight) {
  row <- year - 2000
  new.value <- variable * weight[row, location]
  return(new.value)
}
variable <- rnorm(2000000, 900, 750)
variable <- ifelse(variable < 0, 0, variable)
year <- runif(2000000, min = 2001, max = 2015)
location <- runif(2000000, min = 1, max = 7)
weight <- matrix(runif(14*7, min = 1, max = 1.3), ncol=7)
gc()
new.value.fn(variable, year, location, weight)
# Error: cannot allocate vector of size 29802.3 Gb
gc()
new.value.fn(actual.var, actual.year, actual.location, actual.weight)
# Error: cannot allocate vector of size 15.6 Mb 

Running gc() beforehand as per this question's answers does not change this. What is more surprising is that R states that it can run up to nearly 3 GB of data, yet cannot handle 15.6 Mb, which is approximately the length of the original vector:

> memory.size()
[1] 28691.74
> object.size(variable)
16390984 bytes

My question is: why can't R allocate a vector much smaller than the actual memory size available? It may be related to the fact that the actual function also requires too much memory.

This computer has 32 GB of RAM (31.9 GB usable max). Further information about my computer and session:

    > sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tictoc_1.0        stringr_1.4.0     readxl_1.3.1      readr_1.4.0       questionr_0.7.4   lubridate_1.7.10 
 [7] HeatStress_1.0.7  magrittr_2.0.1    forecast_8.14     dplyr_1.0.5       data.table_1.14.0 arsenal_3.6.2    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        lattice_0.20-41   zoo_1.8-9         assertthat_0.2.1  digest_0.6.27     lmtest_0.9-38    
 [7] psych_2.0.12      utf8_1.2.1        mime_0.10         cellranger_1.1.0  R6_2.5.0          labelled_2.8.0   
[13] ggplot2_3.3.3     highr_0.8         pillar_1.5.1      rlang_0.4.10      curl_4.3          rstudioapi_0.13  
[19] miniUI_0.1.1.1    fracdiff_1.5-1    TTR_0.24.2        munsell_0.5.0     tinytex_0.30      shiny_1.6.0      
[25] compiler_4.0.4    httpuv_1.5.5      xfun_0.21         pkgconfig_2.0.3   mnormt_2.0.2      tmvnsim_1.0-2    
[31] urca_1.3-0        htmltools_0.5.1.1 nnet_7.3-15       tidyselect_1.1.0  tibble_3.1.0      quadprog_1.5-8   
[37] fansi_0.4.2       crayon_1.4.1      later_1.1.0.1     grid_4.0.4        nlme_3.1-152      xtable_1.8-4     
[43] gtable_0.3.0      lifecycle_1.0.0   DBI_1.1.1         scales_1.1.1      quantmod_0.4.18   cli_2.3.1        
[49] stringi_1.5.3     promises_1.2.0.1  tseries_0.10-48   timeDate_3043.102 ellipsis_0.3.1    xts_0.12.1       
[55] generics_0.1.0    vctrs_0.3.6       forcats_0.5.1     tools_4.0.4       glue_1.4.2        purrr_0.3.4      
[61] hms_1.0.0         parallel_4.0.4    fastmap_1.1.0     colorspace_2.0-0  haven_2.3.1  

When attempting the command with the reproducible example or actual data, the memory usage of my computer skyrockets from 14 GB to the max. This is likely related to the issue: enter image description here

EDIT: Example weight is a matrix, but the actual.weight is a data frame. Changing the classes changes the error message size

new.value.fn(variable, year, location, as.data.frame(weight))
# Error: cannot allocate vector of size 15.3 Mb
new.value.fn(actual.var, actual.year, actual.location, as.matrix(actual.weight))
# Error: cannot allocate vector of size 31275.5 Gb

This enables showing of a vector size that does actually exceed the computer's capacity. This suggests that the 15.6 Mb vector was greatly underestimated by R. Though why matrix vs data frame makes such a large difference in the estimate I don't know (and I still need to determine how to carry out the function).

MBorg
  • 1,345
  • 2
  • 19
  • 38
  • Could you make a reproducible example that yields the error? It seems that it could be something specific to your system that you could investigate – csgroen Mar 18 '21 at 09:27
  • Without a reprex hard to say. Looking at your function you have objects that are not passed as arguments. When R goes to the global environment to look for those objects you might be getting some recursion that is blowing up your memory. – MDEWITT Mar 18 '21 at 09:31
  • Added a reproducible example – MBorg Mar 18 '21 at 09:33
  • What are the global variables `data` and `weight` used in the function? – Andrew Chisholm Mar 18 '21 at 10:04
  • Changed the code so that all the arguments are passed through the function, so no searching through the global environment. I have now included examples for all the variables with 2 million values. Strangely the example variables show a "normal" size that the computer can't handle now, whilst the real ones still show 15.6 GB. I wouldn't be able to fit >2 million elements from the real variables online. – MBorg Mar 18 '21 at 10:25

0 Answers0