17

My question relates to the creation of a variable which depends upon other columns within a data.table when none of the variable names are known in advance.

Below is a toy example where I have 5 rows and the new variable should be 1 when the condition is equal to A and 4 elsewise.

library(data.table)
DT <- data.table(Con = c("A","A","B","A","B"),
                 Eval_A = rep(1,5),
                 Eval_B = rep(4,5))
Col1 <- "Con"
Col2 <- "Eval_A"
Col3 <- "Eval_B"
Col4 <- "Ans"

The code below works but feels like I'm misusing the package!

DT[,Col4:=ifelse(DT[[Col1]]=="A",
                 DT[[Col2]],
                 DT[[Col3]]),with=FALSE]

Update: Thanks, I did some quick timing of the answers below. Once on a data.table with 5 million rows and only the relevant columns and again after adding 10 non relevant columns, below are the results:

+-------------------------+---------------------+------------------+
|         Method          | Only relevant cols. | With extra cols. |
+-------------------------+---------------------+------------------+
| List method             | 1.8                 | 1.91             |
| Grothendieck - get/if   | 26.79               | 30.04            |
| Grothendieck - get/join | 0.48                | 1.56             |
| Grothendieck - .SDCols  | 0.38                | 0.79             |
| agstudy - Substitute    | 2.03                | 1.9              |
+-------------------------+---------------------+------------------+

Look's like .SDCols is best for speed and using substitute for easy to read code.

Dominic Edwards
  • 3,608
  • 1
  • 15
  • 9

2 Answers2

13

1. get/if Try using get :

DT[, (Col4) := if (get(Col1) == "A") get(Col2) else get(Col3), by = 1:nrow(DT)]

2. get/join or try this approach:

setkeyv(DT, Col1)
DT[, (Col4):=get(Col3)]["A", (Col4):=get(Col2)]

3. .SDCols or this:

setkeyv(DT, Col1)
DT[, (Col4):=.SD, .SDcols = Col3]["A", (Col4):=.SD, .SDcols = Col2]

UPDATE: Added some additional approaches.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • 2
    In the first answer, it should be `Col4 :=` or `c("Col4") :=`. – Arun Jul 03 '13 at 14:54
  • 1
    `data.table` generally isn't set up to take a character variable that refers to a column - but I think `get` is the best approach that works. – Señor O Jul 03 '13 at 14:55
  • 1
    @Arun, `Col4:=` and `c("Col4"):=` both literally give `Col4` as the column name but the desired column name is `Ans`, not `Col4`. – G. Grothendieck Jul 03 '13 at 14:59
  • @G.Grothendieck, for some reason, when I first ran it, it gave me an error. I am unable to reproduce it now. I get your point. – Arun Jul 03 '13 at 15:00
  • I'm trying to work on the SDCols solution unsuccessfully. What is the explanation of `DT[, (Col4):=.SD, .SDcols = Col3]["A", (Col4):=.SD, .SDcols = Col2]`? `, (Col4):=.SD, .SDcols = Col3]` is "assign value of column 3 to the corresponding cell in Col4"? Is this the default option and then the second bracket is the actual condition `["A", (Col4):=.SD, .SDcols = Col2]`? This second condition would be, if the key is "A", col4 equals Col2? – user3507584 Apr 13 '15 at 15:59
6

Using ifelse and get:

DT[, (Col4) := ifelse (get(Col1) == "A",get(Col2) , get(Col3))]

Or using substitute to create the expression like this :

expr <- substitute(a4 := ifelse (a1 == "A",a2 , a3),
                   list(a1=as.name(Col1),
                        a2=as.name(Col2),
                        a3=as.name(Col3),
                        a4=as.name(Col4)))

DT[, eval(expr)]
agstudy
  • 119,832
  • 17
  • 199
  • 261