1

My question is as follows:

I would like to generate column d based on the information from column c. Column c provides the names of the columns from which to fetch data from, for that given row.

  a  b  c  d  
1 5  3  a  5
2 8  6  b  6
3 12 8  a  12

My current method is very inefficient:

DT[, d:=mget(c)]
for(i in 1:nrow(DT)) { e[i] <- DT[,d][[i]][i]}
DT[,e:=e]

Appreciate it greatly if there is any one-liner solution.

gre_gor
  • 6,669
  • 9
  • 47
  • 52
Luke Shi
  • 17
  • 1
  • 2
  • The method you posted doesn't actually work...please post a clearer example of what your question/desired output is – Mike H. Jun 18 '16 at 00:59

4 Answers4

6

You can group by the values in column c, and use get() to get the values.

dt[, d := get(c), by = c]

which gives

dt
#     a b c  d
# 1:  5 3 a  5
# 2:  8 6 b  6
# 3: 12 8 a 12

Data:

dt <- data.table(a = c(5, 8, 12), b = c(3, 6, 8), c = c("a", "b", "a"))
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • I tried this but got :Error in `[.data.frame`(dt, , `:=`(d, get(c)), by = seq_len(nrow(dt))) : unused argument (by = seq_len(nrow(dt))) – milan Jun 18 '16 at 02:12
  • @milan - Did you do `library(data.table)`? This is a data.table tagged question so I didn't think it was necessary. – Rich Scriven Jun 18 '16 at 02:13
  • Great, thanks. One more thing. If I leave the ':' in, it doesn't work. Gives error message. Without it works fine? – milan Jun 18 '16 at 02:16
  • 2
    Do `setDT(dt)` to set the data as a data table? I don't know what the issue is. It works fine for me. – Rich Scriven Jun 18 '16 at 02:20
2

You actually don't even need data.table if you don't want:

DT$d <- sapply(1:nrow(DT),function(i){DT[i,get(as.character(DT[i,c]))]})

> DT
    a b c  d
1:  5 3 a  5
2:  8 6 b  6
3: 12 8 a 12

This solution is also more flexible in that it allows c to refer to any column in the data.

data

DT<-structure(list(a = c(5L, 8L, 12L), b = c(3L, 6L, 8L), c = structure(c(1L, 
2L, 1L), .Label = c("a", "b"), class = "factor")), .Names = c("a", 
"b", "c"), class = c("data.table", "data.frame"), row.names = c(NA, 
-3L), .internal.selfref = <pointer: 0x00000000001f0788>)
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • Thanks Mike. However, I had a little issue with the code from which I modified to the below which worked. sapply(1:nrow(DT),function(i) {DT[i,get(eval(DT[i,c]))]} ) – Luke Shi Jun 18 '16 at 01:37
  • Hmm, it should work regardless of whether or not you `c` is a character or factor. Whereas I think your code only works if `DT$c` is a character. Can you post the error? – Mike H. Jun 18 '16 at 01:44
  • I get this error : Error in DT[1, c] : incorrect number of dimensions. I used Eval to make the error go away... – Luke Shi Jun 18 '16 at 01:57
1

Your data:

a <- c(5,8,12)
b <- c(3,6,8)
c <- c("a", "b", "a")
df <- as.data.frame(cbind(a,b,c))

This is how you could do it.

d <- NULL
for (i in 1:NROW(df)){d <- c(d, as.character(df[i,as.character(c[i])]))}
df$d <- d

#   a b c  d
#1  5 3 a  5
#2  8 6 b  6
#3 12 8 a 12

This allows you to do the same thing as above in the for loop using just 1 line of code (similar to MikeyMike's answer).

df$d <- sapply(1:NROW(df), function(i){as.character(df[i,as.character(c[i])])})
milan
  • 4,782
  • 2
  • 21
  • 39
  • Thanks Milan for the data.frame solution, this works. Since I am working with a very large data.table, I wonder if data.table specific methods would improve speed. – Luke Shi Jun 18 '16 at 02:18
0

You can use an ifelse statement:

dt[, d := ifelse(c == "a", a, b)]
dt
#     a b c  d
# 1:  5 3 a  5
# 2:  8 6 b  6
# 3: 12 8 a 12

Another option is to consider to reshape your data which can deal with multiple columns problem:

dt[, id := seq_len(nrow(dt))]    # create an id column for reshape purpose
  [melt(dt, id.vars = c("id", "c"))[c == variable], d:=value , on = "id"]  
  # reshape data, select values that match the column names and then join back with the original data.
  [, id := NULL]                 # drop the id column

dt
#     a b c  d
# 1:  5 3 a  5
# 2:  8 6 b  6
# 3: 12 8 a 12
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks, Psidom. However I have about 150 columns and the example posted is a simplified version of the original problem... – Luke Shi Jun 18 '16 at 01:36
  • @LukeShi You should have stated the dimensions of the underlying problem in your question. Some of the answers were misled by the lack of details. Also, please consider to provide the code to create your sample data with your question to save time of the people answering. Thank you. – Uwe Jun 19 '16 at 04:49