R seems to require four bytes of storage per integer, even for small ones:
> object.size(rep(1L, 10000))
40040 bytes
And, what is more, even for factors:
> object.size(factor(rep(1L, 10000)))
40456 bytes
I think, especially in the latter case this could be handled much better. Is there a solution that would help me reduce the storage requirements for this case to eight or even two bits per row? Perhaps a solution that uses the raw
type internally for storage but behaves like a normal factor otherwise. The bit
package offers this for bits, but I haven't found anything similar for factors.
My data frame with just a few millions of rows is consuming gigabytes, and that's a huge waste of memory and run time (!). Compression will reduce the required disk space, but again at the expense of run time.
Related: