0

I have the following two data frames:

First one

            key X         Club  EXPG1  EXPG2 datum       tijd  NR
1 215-17-04-02-2015 1 ADO Den Haag 1.3575 1.3023    17 04-02-2015 215
2 215-17-04-02-2015 2 ADO Den Haag 1.3575 1.3023    17 04-02-2015 215
3 215-17-04-02-2015 3 ADO Den Haag 1.3575 1.3023    17 04-02-2015 215
4 215-17-04-02-2015 4 ADO Den Haag 1.3575 1.3023    17 04-02-2015 215
5 215-17-04-02-2015 5 ADO Den Haag 1.3575 1.3023    17 04-02-2015 215
6 215-17-04-02-2015 6 ADO Den Haag 1.3575 1.3023    17 04-02-2015 215

Second one

 key  V1         V2 V3  V4 V5 V6  V7 V8 V9 V10 V11 V12 V13 V14   V15 V16 V17
 17532 210-1-01-01-2013 210 01-01-2013  1 210 80 80 130 91 NA  77   0   0   10   7  9983  58   8
 17533 210-2-01-01-2013 210 01-01-2013  2 220 70 70 120 88 NA  81   0   0   5  10 9987  41   8
 17534 210-3-01-01-2013 210 01-01-2013  3 230 80 70 120 93 NA  81   0   0   7  12   9985  65   8
 17535 210-4-01-01-2013 210 01-01-2013  4 240 60 60 100 93 NA  84   0   0   0   0 9984  65   8
 17536 210-5-01-01-2013 210 01-01-2013  5 250 50 60 100 90 NA  82   0   0   1   1  9986  63   8

What I would like to now is merge the two and get all the V2 and V3 values from dataset2 and merge them with values from dataset1

Therefore I try:

 df <- merge(df1[,c("key")], df2[,c("key", "V1", "V2", "V3", "V4")])

This however brings up the following error:

Error: cannot allocate vector of size 17.9 Gb
In addition: Warning messages:
1: In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
Reached total allocation of 4011Mb: see help(memory.size)
2: In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
Reached total allocation of 4011Mb: see help(memory.size)
3: In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
Reached total allocation of 4011Mb: see help(memory.size)
4: In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
Reached total allocation of 4011Mb: see help(memory.size)

Could anybody advise me on what is wise to do now? I assume there must be a away to deal with datasets larger than 15gb right?

Frank Gerritsen
  • 185
  • 5
  • 14
  • Possible duplicate of [R memory management / cannot allocate vector of size n Mb](http://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb) – nrussell Jan 04 '16 at 19:21
  • 2
    Check out the `data.table` package. – mrp Jan 04 '16 at 19:21
  • 1
    I think that the `df <- merge` call will consume additional memory, even in `data.table`. A better bet would be to use sql. Overall, I think you may not be able to do this in R as your resultant data.frame is 17 GB, while you only have 4GB on your machine (assuming R has access to all available memory) – Chris Jan 04 '16 at 19:33
  • 1
    Anyway you should buy more memory. 4 Gb is peanuts for people doing serious data management. – MichaelChirico Jan 04 '16 at 19:34
  • @mrp will do. Thanks for the tip – Frank Gerritsen Jan 04 '16 at 19:56
  • 1
    Is there any opportunity to achieve your work in pieces? If you have too little memory you will need to think about chunking the information for processing which can be problematic. Basically if you can do the work in pieces, then your total datasize is a non-issue. Is it necessary to load in all 15 Gigs at once? – Badger Jan 04 '16 at 20:26
  • 1
    the `ff` package along with `ffbase` allow for data.frames that exceed memory. They are stored on disk and can be transparently slurped into memory when needed. The `ffbase` package provides a subset of common R operations that work on the "ff data.frame" objects. – Zelazny7 Jan 04 '16 at 20:35

0 Answers0