I am reading Hadley's Advanced R Programming and when it discusses the memory size for characters it says this:
R has a global string pool. This means that each unique string is only stored in one place, and therefore character vectors take up less memory than you might expect.
The example the book gives is this:
library(pryr)
object_size("banana")
#> 96 B
object_size(rep("banana", 10))
#> 216 B
One of the exercises in this section is to compare these two character vectors:
vec <- lapply(0:50, function(i) c("ba", rep("na", i)))
str <- lapply(vec, paste0, collapse = "")
object_size(vec)
13.4 kB
object_size(str)
8.74 kB
Now, since the passage states that R has a global string pool, and since vector vec
is composed mainly of repetitions of two strings ("ba" and "na") I actually would - intuitively - expect the size of vec
to be smaller than the size of str
.
So my question is: how could you most accurately estimate the size of those vectors beforehand?