How to avoid loops in outcome generator

Question

I have a data frame with probabilities for three outcomes: A, B and C. Their probabilities are prob1, prob2, and prob3:

df = data.frame(prob1=runif(1000,0,0.2),prob2=runif(1000,0,0.1))
df$prob3 = 1-df$prob1-df$prob2

I am trying to simulate an outcome for each row given its unique probabilities and run the following loop:

df$outcome = NA
for (i in 1:1000) {
   df$outcome[i]<-sample(c(A,B,C), 1, prob = c(df$prob1[i],df$prob2[i],df$prob3[i]), replace = FALSE)
}

I have a large data set and would like to avoid loops. How can I do that?

Ben Bolker · Answer 1 · 2020-08-10T00:32:48.607

Here's one way via multinomial sampling:

m <- t(apply(df,1,rmultinom,n=1,size=1))  ## 1000 x 3 matrix of 0/1 values
w <- apply(m,1,which)                     ## vector of 1000 values in {1,2,3}

If you want labels you could follow this with c("A","B","C")[w].

If you want to go beyond base R, the Hmisc package has rMultinom:

library(Hmisc)
colnames(df) <- c("A","B","C")
w <- rMultinom(df, m=1)

I modified the column names because rMultinom automatically uses the column names as the values of the samples.

If you need really fast vectorized multinomial sampling and you're willing to deal with the hassle of compiled code, the answers to this question can help.

Ronak Shah · Answer 2 · 2020-08-10T00:16:01.367

1

You can use apply :

df$outcome <- apply(df, 1, function(x) sample(c(A, B, C), 1, prob = x))

Or using dplyr rowwise :

library(dplyr)

df %>%
  rowwise() %>%
  mutate(outcome = sample(c(A,B, C), 1, prob = c_across()))

edited Aug 10 '20 at 00:16

answered Aug 10 '20 at 00:11

Ronak Shah

377,200
20
156
213

How to avoid loops in outcome generator

2 Answers2