2

I have a very big boolean vector (like a=c(TRUE, TRUE,FALSE, FALSE) but times larger) and would like to store it in a file as compact as possible. What is the easiest way to do it?

Thanks,

smci
  • 32,567
  • 20
  • 113
  • 146
user29514
  • 53
  • 1
  • 3
  • What do you want to do with the file? Read it with R again or process it with a different program? I would look at this question: http://stackoverflow.com/questions/1635278/saving-a-data-frame-as-a-binary-file – Thilo Nov 18 '13 at 10:23
  • @Thilo: For now I want to store these files on the disk so that they take as small amount of space as possible – user29514 Nov 18 '13 at 10:32
  • @user1317221_G: your suggestion produces the fie of size 46 bytes while a can be stored in less than 1 byte. – user29514 Nov 18 '13 at 10:34
  • Why not use `save` and then `gzip` the resulting file for good measure? – musically_ut Nov 18 '13 at 11:04
  • 1
    @musically_ut `save` has a `compress` argument that can gzip its contents. – Richie Cotton Nov 18 '13 at 11:30
  • Thanks for the answers. To clarify I want to save the vector as a binary file where to every TRUE there is a corresponding bit 1 and to every FALSE there is a corresponding bit 0. In this case boolean vector of length N*2^8 will use N bytes on the hard drive. – user29514 Nov 18 '13 at 12:40

1 Answers1

2

As the linked question suggests, saving as a binary rds file using saveRDS is the best option, provided that you only want to use the resulting file with R, rather than any other programs.

If your vector doesn't have any missing values, you can convert the logical vector to a bit vector, which takes up half as much space on disk. (It also uses less memory in your workspace.)

library(bit)
x <- runif(1e6) > 0.5
x2 <- as.bit(x)
saveRDS(x, "x.rds")    # takes up 246kb
saveRDS(x2, "x2.rds")  # takes up 123kb

If you need to reuse the variable in other programs, then choose a format that that program can read! HDF5 is a common, compact format that may be suitable.

Community
  • 1
  • 1
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360