I have a big data frame containing 3 million rows x 4 columns of char and int values. When I save this file with the R base save() command it takes up 16 Mb of space.
I then bind a small but otherwise identical file (1500 rows x 4 columns of char and int values) onto the end of the other file and save it again.
Everything works, but the file now takes up 24 Mb. Does any one have any idea as to why this happens? I'm working with many many millions of observations so keeping the size (and processing time) down is some what important to me.
str
of the two files:
> str(bluetooth_oct)
'data.frame': 3069577 obs. of 4 variables:
$ timestamp : int 1380574809 1380574842 1380574852 1380574852 1380574864 1380574873 1380574890 1380574901 1380574901 1380574901 ...
$ scanned_user: chr "729d6181f70676b50921b11d2b0009" "792b94ad80885c219a53366de477d8" "e2f169c1af5636f137fa5cc8565bff" "02fbc27420b2c30e451b2457f22141" ...
$ user : chr "30383e7d47ff768d56639c31ac2664" "c7db19a439bd43bf467912f56453d7" "7eab3d34a4f9cc42e6c3e3d2de0b92" "7eab3d34a4f9cc42e6c3e3d2de0b92" ...
$ rssi : int -92 -76 -95 -70 -90 -97 -82 -63 -91 -90 ...
> str(bluetooth_oct2)
'data.frame': 3068039 obs. of 4 variables:
$ timestamp : int 1380574809 1380574842 1380574852 1380574852 1380574864 1380574873 1380574890 1380574901 1380574901 1380574901 ...
$ scanned_user: chr "729d6181f70676b50921b11d2b0009" "792b94ad80885c219a53366de477d8" "e2f169c1af5636f137fa5cc8565bff" "02fbc27420b2c30e451b2457f22141" ...
$ user : chr "30383e7d47ff768d56639c31ac2664" "c7db19a439bd43bf467912f56453d7" "7eab3d34a4f9cc42e6c3e3d2de0b92" "7eab3d34a4f9cc42e6c3e3d2de0b92" ...
$ rssi : int -92 -76 -95 -70 -90 -97 -82 -63 -91 -90 ...