Create a new column from different columns of one data frame conditioned on another column from another data frame

Question

Suppose I have two data frame

df1 <- data.frame(A = 1:6, B = 7:12, C = rep(1:2, 3))
df2 <- data.frame(C = 1:2, D = c("A", "B"))

I want to create a new column E in df1 whose value is based on the values of Column C, which can then be connected to Column D in df2. For example, the C value in the first row of df1 is "1". And value 1 of column C in df2 corresponds to "A" of Column D, so the value E created in df2 should from column "A", i.e., 1.

As suggested by Select values from different columns based on a variable containing column names, I can achieve this by two steps:

setDT(df1)
setDT(df2)
df3 <- df1[df2, on = "C"] # step 1 combines the two data.tables
df3[, E := .SD[[.BY[[1]]]], by = D] # step 2

My question is: Could we do this in one step? Furthermore, as my data is relatively large, the first step in this original solution takes a lot time. Could we do this in a faster way? Any suggestions?

If it was marked as duplicated, it means that the answer was in the other question;) — Emmanuel-Lin, Aug 29 '17 at 11:04
sorry for the confusing, the updated question is now different from the original one — Likan Zhan, Aug 29 '17 at 11:32
See my answer [here](https://stackoverflow.com/a/45938588/680068), see if it works. Instead of merging, make a named vector. — zx8754, Aug 29 '17 at 11:59
Try this: `df1$E <- setNames(as.character(df2$D), df2$C)[ df1$C ]` — zx8754, Aug 29 '17 at 12:05
You could try `df1[, E := df2[df1, get(as.character(D)), on = .(C), by = .EACHI]$V1]` perhaps. If `D` is already a `character` it would better. Also, data.table currently doesn't support matrix subsetting- which would be the best solution here probably (e.g. https://stackoverflow.com/questions/27023819/subset-a-matrix-according-to-a-columns-vector) — David Arenburg, Aug 29 '17 at 12:34
Hm, neither of the answers in the link suggested making `df3`. Anyway, I'll probably post one shortly... — Frank, Aug 29 '17 at 17:03

myincas · Answer 1 · 2017-08-30T05:48:58.013

0

you can try this, the C column can indicates column value from df1

setDT(df1) df1[, e := eval(parse(text = names(df1)[C])), by = 1:nrow(df1)] df1

A B C e 1: 1 7 1 1 2: 2 8 2 8 3: 3 9 1 3 4: 4 10 2 10 5: 5 11 1 5 6: 6 12 2 12

edited Aug 30 '17 at 05:48

answered Aug 29 '17 at 12:08

myincas

1,500
10
15

Please don't forget to add some text explaining your answer - why it works and how it solves the original problem. – Balagurunathan Marimuthu Aug 30 '17 at 05:18

score 0 · Accepted Answer · answered Aug 29 '17 at 17:06

Here's how I would do it:

df1[df2, on=.(C), D := i.D][, E := .SD[[.BY$D]], by=D]

   A  B C D  E
1: 1  7 1 A  1
2: 2  8 2 B  8
3: 3  9 1 A  3
4: 4 10 2 B 10
5: 5 11 1 A  5
6: 6 12 2 B 12

This adds the columns to df1 by reference instead of making a new table and so I guess is more efficient than building df3. Also, since they're added to df1, the rows retain their original ordering.

Create a new column from different columns of one data frame conditioned on another column from another data frame

2 Answers2