5

I am eager to save two 460 x 5000 numeric matrices into my R-package. Following the instructions in: How to effectively deal with uncompressed saves during package check? I saved the objects as:

save(mat1,file="mat1.rda",compress="xz")
save(mat2,file="mat2.rda",compress="xz")

However, the resulting R-objects are quite large (8.7MB and 8.9 MB) and the R CMD CHECK --as-cran gives me the notes:

 * checking installed package size ... NOTE
   installed size is 20.1Mb
   sub-directories of 1Mb or more:
   data  20.0Mb

In my understanding, one cannot submit R packages to CRAN which does not "pass" (i.e., no Note nor warning) R CMD CHECL --as-cran. Is there way to compress the dataset even smaller?

Community
  • 1
  • 1
FairyOnIce
  • 2,526
  • 7
  • 27
  • 48

2 Answers2

6

Is it really necessary to include those files? I see several options:

  • Include a smaller subset of the matrix, which you use in the examples.
  • Generate a matrix on-the-fly, e.g. with random numbers.
  • Put the files somewhere for download, and ensure that the examples do not execute.
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • I use the whole dataset in my analysis and the dataset is a real dataset so the first two options are not for me. It would be nice if everything (codes and data) used in my manuscript is in one place. However, I understand that this is too much to ask for and putting the dataset somewherelse is an alternative way to make my dataset public. Thank you. – FairyOnIce Apr 22 '14 at 08:00
  • 2
    I would include a subset of the data just to show how the code works, and to run some tests. In addition I would release the entire dataset for sake of reproducibility. – Paul Hiemstra Apr 22 '14 at 08:01
1

Consider distributing the data in a separate data package that will be built, uploaded and installed only once (hopefully). Compare this to the efforts required to retransfer the same data over and over again as you update your package.

(Of course, this applies only if you intend to supply updates to your package. There's no difference if your code is perfect right from the start ;-) )

krlmlr
  • 25,056
  • 14
  • 120
  • 217