-1

Apparently if I try this:

# first grab the package
install.packages("stringi")
library(stringi)

# and then try to generate some serious dummy data
my_try <- as.vector(sample(1111111111:99999999999,3000000,replace=T))

R will say NOPE, sorry:

Error: cannot allocate vector of size 736.8 Gb

Should I buy more RAM*?

*this is a joke, but I seriously appreciate any help!

EDIT: The desired output is a dataframe of 20 variables, and 3x10^6 rows. Some columns/variables should be strings, some integers. All in lengths ranging from 2 to 12.

nick88
  • 118
  • 1
  • 8
  • 2
    Just wondering, why do you need `stringi`? Cause `as.vector()` and `sample()` are both base R – 12b345b6b78 Oct 28 '18 at 18:36
  • 1
    Also, that's an incredibly large range of numbers. Do you really need it to be that wide, especially if you're okay with sampling with replacement? – 12b345b6b78 Oct 28 '18 at 18:38
  • 2
    Possible duplicate of [R memory management / cannot allocate vector of size n Mb](https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb) – markus Oct 28 '18 at 18:39
  • I need a dataframe in the end of +/- 20 variables, and 3x10^6 rows. Some columns/variables should be strings, some integers. All in lengths ranging from 2 to 12. – nick88 Oct 28 '18 at 18:39
  • I thought this should be super simple, but I can't figure it out apparently :) – nick88 Oct 28 '18 at 18:39
  • If you narrow the range down, that will become more manageable. E.g. `a <- as.vector(sample(100000:999999,3000000,replace=T)) as.numeric(paste0(as.character(a), as.character(a)))` (concat the integer vectors as strings doubling the nchar, and then coerce to numeric) – 12b345b6b78 Oct 28 '18 at 18:42

1 Answers1

2

The error isn't coming from sampling 3 million values, it's from trying to create a population of about 90 billion values 1111111111:99999999999 from which to sample. If you want to sample from that range, sample from the range 1:88888888889 and add 11111111110 using

sample(88888888889, 3000000,replace=TRUE) + 11111111110

There's no need for as.vector at the end, it's already a vector.

P.S. I believe in R-devel the range 1111111111:99999999999 will be stored much more efficiently (basically just the limits), but I don't know if sample() will be modified to work with it that way.

user2554330
  • 37,248
  • 4
  • 43
  • 90