In R, I want to create a factor with only a few levels, but with a length of almost 100 million. The "normal" way for me to create a factor is to call factor
on a character vector, but I expect this method to be very inefficient. What is the proper way to construct a long factor without fully expanding the corresponding character vector.
Here is an example of the wrong way to do it: creating and then factoring a character vector:
long.char.vector = sample(c("left", "middle", "right"), replace=TRUE, 50000000)
long.factor = factor(long.char.vector)
How can I construct long.factor
without first constructing long.char.vector
? Yes, I know those two lines of code can be combined, but the resulting line of code still creates the gigantic char vector anyway.