I'm trying to implement a generalized "switching equation" (Gerber and Green 2012, chapter 2) in r.
I have a categorical variable Z, that has k > 2 values.
I have k columns names Y_1, Y_2,... Y_k.
I want to make a variable Y that picks our the "right" values from each column. That is, if Z is 1, put the Y_1 values into Y.
I've got a solution with a loop, but it's annoying. Is there a super sweet way to do this with a one liner? No nested ifelse, pls.
N <- 100
df <- data.frame(
Z = sample(1:3, N, replace = TRUE),
Y_1 = rnorm(N),
Y_2 = rnorm(N),
Y_3 = rnorm(N)
)
# an annoying solution
df <- within(df,{
Y <- rep(NA, nrow(df))
Y[Z == 1] <- Y_1[Z == 1]
Y[Z == 2] <- Y_2[Z == 2]
Y[Z == 3] <- Y_3[Z == 3]
})
head(df)
which yields:
Z Y_1 Y_2 Y_3 Y
1 3 0.89124772 1.4377700 0.05226285 0.05226285
2 1 0.89186873 -0.6984839 -0.86141525 0.89186873
3 1 -0.01315678 1.5193461 0.18290065 -0.01315678
4 3 -0.57857274 -1.4445197 2.03764943 2.03764943
5 3 -0.19793692 -0.1818225 1.10270877 1.10270877
6 2 1.48291431 2.7264541 0.70129357 2.72645413
EDIT: I like Weihuang Wong's approach df$Y <- sapply(split(df, 1:nrow(df)), function(x) x[, paste0("Y_", x$Z)])
in part because it doesn't rely on position but rather the column names. All of the offered answers so far use column position.... I'm a tiny bit worried that sapply(split())
is slow, but maybe I'm crazy?