Suppose I have two data frame
df1 <- data.frame(A = 1:6, B = 7:12, C = rep(1:2, 3))
df2 <- data.frame(C = 1:2, D = c("A", "B"))
I want to create a new column E in df1 whose value is based on the values of Column C, which can then be connected to Column D in df2. For example, the C value in the first row of df1 is "1". And value 1 of column C in df2 corresponds to "A" of Column D, so the value E created in df2 should from column "A", i.e., 1.
As suggested by Select values from different columns based on a variable containing column names, I can achieve this by two steps:
setDT(df1)
setDT(df2)
df3 <- df1[df2, on = "C"] # step 1 combines the two data.tables
df3[, E := .SD[[.BY[[1]]]], by = D] # step 2
My question is: Could we do this in one step? Furthermore, as my data is relatively large, the first step in this original solution takes a lot time. Could we do this in a faster way? Any suggestions?