I have data that comes to me with many similar variables, with an additional variable which indicates which one of those similar variables I really want. Using a loop I can look up the correct value, but the data is large, the loop is slow, and it seems like this should be vectorizable. I just haven't figured out how.
EDIT: The selected variable will be used as a new variable in the same data frame, so order matters. There are many other variables not shown in the example given below.
Example data set:
set.seed(0)
df <- data.frame(yr1 = sample(1000:1100, 8),
yr2 = sample(2000:2100, 8),
yr3 = sample(3000:3100, 8),
yr4 = sample(4000:4100, 8),
var = paste0("yr", sample(1:4, 8, replace = TRUE)))
# df
#
# yr1 yr2 yr3 yr4 var
# 1 1090 2066 3050 4012 yr3
# 2 1026 2062 3071 4026 yr2
# 3 1036 2006 3098 4038 yr1
# 4 1056 2020 3037 4001 yr4
# 5 1088 2017 3075 4037 yr3
# 6 1019 2065 3089 4083 yr4
# 7 1085 2036 3020 4032 yr1
# 8 1096 2072 3061 4045 yr3
This loop method does the trick, but is slow and awkward:
ycode <- character(nrow(df))
for(i in 1:nrow(df)) {
ycode[i] <- df[i, df$var[i]]
}
df$ycode <- ycode
# df
# yr1 yr2 yr3 yr4 var ycode
# 1 1090 2066 3050 4012 yr3 3050
# 2 1026 2062 3071 4026 yr2 2062
# 3 1036 2006 3098 4038 yr1 1036
# 4 1056 2020 3037 4001 yr4 4001
# 5 1088 2017 3075 4037 yr3 3075
# 6 1019 2065 3089 4083 yr4 4083
# 7 1085 2036 3020 4032 yr1 1085
# 8 1096 2072 3061 4045 yr3 3061
It seems like I should be able to vectorize this, like so:
df$ycode <- df[, df$var]
But I find the result surprising:
# yr1 yr2 yr3 yr4 var ycode.yr3 ycode.yr2 ycode.yr1 ycode.yr4 ycode.yr3.1 ycode.yr4.1 ycode.yr1.1 ycode.yr3.2
# 1 1090 2066 3050 4012 yr3 3050 2066 1090 4012 3050 4012 1090 3050
# 2 1026 2062 3071 4026 yr2 3071 2062 1026 4026 3071 4026 1026 3071
# 3 1036 2006 3098 4038 yr1 3098 2006 1036 4038 3098 4038 1036 3098
# 4 1056 2020 3037 4001 yr4 3037 2020 1056 4001 3037 4001 1056 3037
# 5 1088 2017 3075 4037 yr3 3075 2017 1088 4037 3075 4037 1088 3075
# 6 1019 2065 3089 4083 yr4 3089 2065 1019 4083 3089 4083 1019 3089
# 7 1085 2036 3020 4032 yr1 3020 2036 1085 4032 3020 4032 1085 3020
# 8 1096 2072 3061 4045 yr3 3061 2072 1096 4045 3061 4045 1096 3061
I also tried numerous variations on *apply, but none of those even came close. Some attempts:
> apply(df, 1, function(x) x[x$var])
Error in x$var : $ operator is invalid for atomic vectors
> apply(df, 1, function(x) x[x[var]])
Error in x[var] : invalid subscript type 'closure'
Any ideas? Many thanks..