1

I am running into an error as I am trying to upgrade my data package by adding additional data. The problem is that I am hitting a memory error. I am sure others have made packages with larger data sets. How do people get around this problem?

==> R CMD INSTALL --no-multiarch --with-keep.source cfsales

* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘cfsales’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
Error in lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,  : 
  long vectors not supported yet: connections.c:5984
ERROR: lazydata failed for package ‘cfsales’
* removing ‘/usr/local/lib/R/site-library/cfsales’
Warning in q("no", status = status, runLast = FALSE) :
  system call failed: Cannot allocate memory

Here is a list of what is inside my data directory along with file sizes in MB.

total 188M
-rw-rw-r-- 1 ahallam ahallam   1M Jun 17 20:18 holidays_events.rda
-rw-rw-r-- 1 ahallam ahallam   1M Jun 17 20:18 items.rda
-rw-rw-r-- 1 ahallam ahallam   1M Jun 17 20:18 oil.rda
-rw-rw-r-- 1 ahallam ahallam   1M Jun 20 08:51 store_day_sales.rda
-rw-rw-r-- 1 ahallam ahallam   1M Jun 17 20:18 stores.rda
-rw-rw-r-- 1 ahallam ahallam   2M Jun 19 17:23 test.rda
-rw-rw-r-- 1 ahallam ahallam 185M Jun 19 17:36 train.rda
-rw-rw-r-- 1 ahallam ahallam   1M Jun 17 20:18 transactions.rda

This is my sessionInfo

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cfsales_0.0.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       clisymbols_1.2.0 packrat_0.5.0    crayon_1.3.4     withr_2.1.2      rprojroot_1.3-2  assertthat_0.2.1 R6_2.4.0        
 [9] backports_1.1.4  git2r_0.25.2     magrittr_1.5     rlang_0.3.4      rstudioapi_0.10  fs_1.3.1         testthat_2.1.1   desc_1.2.0      
[17] tools_3.6.0      glue_1.3.1       pkgload_1.0.2    compiler_3.6.0   usethis_1.5.0
Alex
  • 2,603
  • 4
  • 40
  • 73
  • 4
    Possible duplicate of [Unable to install.packages(): system call failed: Cannot allocate memory; installation of package had non-zero exit status](https://stackoverflow.com/questions/27136264/unable-to-install-packages-system-call-failed-cannot-allocate-memory-instal) – NelsonGon Jun 20 '19 at 13:46
  • Looks like you're on GNU/Linux, the above post applies to the same. – NelsonGon Jun 20 '19 at 13:48
  • 1
    That was a good link. It turns out that my swap was already at 60 and I restarted my computer, closed all my apps, and still had the same issue. I am confused why my package with 188MB is taking all my 16GB of RAM. Do you know where I can learn more? @NelsonGon – Alex Jun 20 '19 at 14:00
  • Might try dter's comment on the accepted answer or ask over at super user. Not sue about their policies though. Alternatively, someone might see this post later and help you solve it. – NelsonGon Jun 20 '19 at 14:04
  • The phrase " long vectors not supported ye:" jumps out at me. What exactly is *in* your data? Which R version are we dealing with here? – MrFlick Jun 20 '19 at 15:16
  • @MrFlick I just uploaded my sessionInfo. Also, I delete the massive 185M `train.rda` file and everything worked out fine. When that file is uncompressed it is 4.7GB as a csv. I would like to include that file in the package, but it seems that may it may not be an option. – Alex Jun 20 '19 at 17:35

1 Answers1

0

As noted in the comments the message

long vectors not supported yet: connections.c:5984

is suspicious. Looking at that line we see

inlen = LENGTH(in);

where LENGTH (as opposed to XLENGTH) does not support long vectors, i.e. vectors with more than 2^31 - 1 elements. In some form your data file is represented by a vector that is longer than this.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75