R data.table: Generate random numbers

Question

I have a large data.table and I am trying to generate binomial random numbers (using rbinom) using the values of one of the columns as the parameter to the distribution. Assume that index is a unique row identifier, and that the parameter is in the responseProb column. Then

dt[, response := rbinom(1, 1, responseProb), by = index]

rbinom's signature is rbinom(n, size, prob), but since it is not vectorized over the prob argument, it can only take a scalar as input, so I can't, but would be able to write:

dt[, response := rbinom(1, 1, responseProb)]

To give a simple example of what I mean, rbinom(1, 1, seq(0.1, 0.9, .1)), yields

> rbinom(1, 1, seq(0.1, 0.9, .1))
[1] 1

I think that the solution to this is to use

dt[, response := rbinom(probResponse, 1, responseProb)]

but want to double check that this would lead to the same answer as the first line of code.

rbinom is vectorized on the prob argument. But it will only use/generate as many observations as specified in n. So you'll want to make sure your n is at least as large as probs is long. — Dason, Apr 14 '15 at 13:19
If you gave a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that would help ... — Ben Bolker, Apr 14 '15 at 13:34

score 3 · Answer 1 · answered Apr 14 '15 at 13:52

So rbinom is vectorized and you can use .N as the first argument.

dt[, response := rbinom(.N, 1, responseProb)]

To check that this gives the same result as the indexing solution, just set a seed and repeat.

# create reproducible example
N <- 100
dt <- data.table(responseProb = runif(N), 
                 index = 1:N)
# set seed
set.seed(1)
# your original version
dt[, response := rbinom(1, 1, responseProb), by = index]
# set seed again
set.seed(1)
# version with .N
dt[, response2 := rbinom(.N, 1, responseProb)]
# check for equality
dt[, all(response == response2)]
## [1] TRUE

R data.table: Generate random numbers

1 Answers1