I wrote the following in RStudio but the result is amazing !!!
rm(list = ls())
require(lobstr)
x <- 1:3
tracemem(x)
y <- x
x[1] <- 4L
obj_size(x)
obj_size(y)
Why they are different?
I wrote the following in RStudio but the result is amazing !!!
rm(list = ls())
require(lobstr)
x <- 1:3
tracemem(x)
y <- x
x[1] <- 4L
obj_size(x)
obj_size(y)
Why they are different?
R uses special compact storage for sequences. When you change the first entry, it drops back to the standard storage.
The special storage is actually inefficient for a short sequence like 1:3
, but the size would be the same for 1:3000000
:
rm(list = ls())
library(lobstr)
x <- 1:3
y <- x
x[1] <- 4L
obj_size(x)
#> 64 B
obj_size(y)
#> 680 B
x <- 1:3000000
y <- x
x[1] <- 4L
obj_size(x)
#> 12,000,048 B
obj_size(y)
#> 680 B
Created on 2020-12-26 by the reprex package (v0.3.0)
It's also quite hard to define the size of objects in R. For example, a sequence of length 3 (or 3 million) takes up 680 bytes, but two of them don't take up twice that:
x <- 1:3
obj_size(x)
#> 680 B
y <- 1:3000000
obj_size(y)
#> 680 B
z <- list(x, y)
obj_size(z)
#> 896 B
Created on 2020-12-26 by the reprex package (v0.3.0)
The size of z
would contain the size of the list()
container as well as the two objects bound to x
and y
, but it's still only 216 bytes bigger than each of them. This is because some of the size attributed to x
and y
is shared: they're both the same kind of special object, so the code to handle that is only stored once.