1

Currently I'm using "cube" function for balanced sampling in R. It works fine on moderate amount of data. However, if the entire population of 10,000,000+ is used, R hangs. Is there any alternative that works with "big-data"?

Community
  • 1
  • 1
user876301
  • 164
  • 1
  • 11
  • Can you please supply [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – Thomas Apr 23 '14 at 20:32

1 Answers1

3

First, you should reinstall the package BalancedSampling to make sure that you have the latest version 1.4. For me, it seems to work fine for N = 10000000 (takes about 30s to select a sample)

library(BalancedSampling)
N = 10000000 # population size
n = 100 # sample size
p = rep(n/N,N) # inclusion probabilities
X = cbind(p,runif(N),runif(N),runif(N)) # matrix of 3 auxiliary variables
system.time(cube(p,X))
 user  system elapsed
 31.31    0.02   31.42 
Samuel-Rosa
  • 339
  • 3
  • 10