I'm working with a >1 GB data set and running into out of memory ("Cannot allocate...") errors in ggplot2
graphing. In trying to research where all my memory is going (with the help of sources like this and this and this, I've discovered that the following code with dummy data causes significant memory usage that appears to be unclaimed in the Windows Task Manager even after repeated calls to gc()
.
print(begMemSize <- memory.size())
library(ggplot2)
numRows <- 1e6
df <- data.frame( x1 = runif(numRows), x2 = runif(numRows), xGroup = factor(trunc(runif(numRows, 1, 6))) )
df$y = df$x1 + df$x2
gc()
print(mid1MemSize <- memory.size())
# This is fine
ggplot( data = df, mapping = aes( x = x1)) +
geom_smooth( mapping = aes( y = y))
gc()
print(mid2MemSize <- memory.size())
# This makes memory.size() explode
ggplot( data = df, mapping = aes( x = x1)) +
geom_smooth( mapping = aes( y = y)) +
geom_hline( mapping = aes( yintercept = 0.25))
gc()
print(endMemSize <- memory.size())
The expression c( begMemSize, mid1MemSize, mid2MemSize, endMemSize)
returns:
[1] 50.62 102.30 199.22 1208.39
Note the huge jump in the last number. That last number matches readings in Windows Task Manager (very close to "Memory (active working set)" and only slightly lower than "Commit size" in the Details tab). Sometimes, with repeated calls to gc()
I can get memory.size()
to go down in R but not the readings in the Windows Task Manager. I worry that my out-of-memory errors are related to this, but my immediate questions are:
- Why is this happening?
- Is there any way to get the Windows Task Manager memory readings to go down in this situation (without, obviously, closing R and losing all the data processing in memory)?
sessionInfo()
output (using RStudio 1.3.1056):
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.3.2
loaded via a namespace (and not attached):
[1] rstudioapi_0.11 magrittr_1.5 splines_4.0.2 tidyselect_1.1.0 munsell_0.5.0 colorspace_1.4-1 lattice_0.20-41 R6_2.4.1 rlang_0.4.6 dplyr_1.0.0 tools_4.0.2 grid_4.0.2
[13] gtable_0.3.0 nlme_3.1-148 mgcv_1.8-31 withr_2.2.0 ellipsis_0.3.1 digest_0.6.25 tibble_3.0.1 lifecycle_0.2.0 crayon_1.3.4 Matrix_1.2-18 farver_2.0.3 purrr_0.3.4
[25] vctrs_0.3.1 glue_1.4.1 labeling_0.3 compiler_4.0.2 pillar_1.4.4 generics_0.0.2 scales_1.1.1 pkgconfig_2.0.3