2

I have a big data frame containing 3 million rows x 4 columns of char and int values. When I save this file with the R base save() command it takes up 16 Mb of space.

I then bind a small but otherwise identical file (1500 rows x 4 columns of char and int values) onto the end of the other file and save it again.

Everything works, but the file now takes up 24 Mb. Does any one have any idea as to why this happens? I'm working with many many millions of observations so keeping the size (and processing time) down is some what important to me.

str of the two files:

> str(bluetooth_oct)
'data.frame':   3069577 obs. of  4 variables:
 $ timestamp   : int  1380574809 1380574842 1380574852 1380574852 1380574864 1380574873 1380574890 1380574901 1380574901 1380574901 ...
 $ scanned_user: chr  "729d6181f70676b50921b11d2b0009" "792b94ad80885c219a53366de477d8" "e2f169c1af5636f137fa5cc8565bff" "02fbc27420b2c30e451b2457f22141" ...
 $ user        : chr  "30383e7d47ff768d56639c31ac2664" "c7db19a439bd43bf467912f56453d7" "7eab3d34a4f9cc42e6c3e3d2de0b92" "7eab3d34a4f9cc42e6c3e3d2de0b92" ...
 $ rssi        : int  -92 -76 -95 -70 -90 -97 -82 -63 -91 -90 ...

> str(bluetooth_oct2)
'data.frame':   3068039 obs. of  4 variables:
 $ timestamp   : int  1380574809 1380574842 1380574852 1380574852 1380574864 1380574873 1380574890 1380574901 1380574901 1380574901 ...
 $ scanned_user: chr  "729d6181f70676b50921b11d2b0009" "792b94ad80885c219a53366de477d8" "e2f169c1af5636f137fa5cc8565bff" "02fbc27420b2c30e451b2457f22141" ...
 $ user        : chr  "30383e7d47ff768d56639c31ac2664" "c7db19a439bd43bf467912f56453d7" "7eab3d34a4f9cc42e6c3e3d2de0b92" "7eab3d34a4f9cc42e6c3e3d2de0b92" ...
 $ rssi        : int  -92 -76 -95 -70 -90 -97 -82 -63 -91 -90 ...
smci
  • 32,567
  • 20
  • 113
  • 146
Bornakke
  • 77
  • 7
  • Without any sample data, all we can do is guess. My guess is that the second object has attributes or something else you can't see when you print it. – Joshua Ulrich Sep 17 '15 at 11:56
  • 2
    rbind is allocating alot of memory that it does not need. See this post and look at the accepted answer first : http://stackoverflow.com/questions/7093984/memory-efficient-alternative-to-rbind-in-place-rbind – pcantalupo Sep 17 '15 at 12:48
  • 1
    @pcantalupo: the fact that `rbind` uses a lot of RAM cannot explain why the saved object takes 50% more disk space for a marginal increase in the number of rows. – Joshua Ulrich Sep 17 '15 at 23:02

0 Answers0