1

I have a large data.table and I am trying to generate binomial random numbers (using rbinom) using the values of one of the columns as the parameter to the distribution. Assume that index is a unique row identifier, and that the parameter is in the responseProb column. Then

dt[, response := rbinom(1, 1, responseProb), by = index]

rbinom's signature is rbinom(n, size, prob), but since it is not vectorized over the prob argument, it can only take a scalar as input, so I can't, but would be able to write:

dt[, response := rbinom(1, 1, responseProb)]

To give a simple example of what I mean, rbinom(1, 1, seq(0.1, 0.9, .1)), yields

> rbinom(1, 1, seq(0.1, 0.9, .1))
[1] 1

I think that the solution to this is to use

dt[, response := rbinom(probResponse, 1, responseProb)]

but want to double check that this would lead to the same answer as the first line of code.

tchakravarty
  • 10,736
  • 12
  • 72
  • 116
  • 1
    rbinom is vectorized on the prob argument. But it will only use/generate as many observations as specified in n. So you'll want to make sure your n is at least as large as probs is long. – Dason Apr 14 '15 at 13:19
  • If you gave a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that would help ... – Ben Bolker Apr 14 '15 at 13:34

1 Answers1

3

So rbinom is vectorized and you can use .N as the first argument.

dt[, response := rbinom(.N, 1, responseProb)]

To check that this gives the same result as the indexing solution, just set a seed and repeat.

# create reproducible example
N <- 100
dt <- data.table(responseProb = runif(N), 
                 index = 1:N)
# set seed
set.seed(1)
# your original version
dt[, response := rbinom(1, 1, responseProb), by = index]
# set seed again
set.seed(1)
# version with .N
dt[, response2 := rbinom(.N, 1, responseProb)]
# check for equality
dt[, all(response == response2)]
## [1] TRUE
shadow
  • 21,823
  • 4
  • 63
  • 77