5

I have a data table with a number of columns containing values. I have another column which defines which one of those columns whose value I need to select. I am having trouble finding a way to do this.

Here is a simple example.

> d <- data.table(
     value.1 = c("one", "uno", "1"),
     value.2 = c("two", "dos", "2"),
     name.of.col = c("value.1","value.2","value.1"))

> d
   value.1 value.2 name.of.col
1:     one     two     value.1
2:     uno     dos     value.2
3:       1       2     value.1

I would like to add a column 'value.of.col' which contains the value of the column specified by 'name.of.col'.

> d
   value.1 value.2 name.of.col  value.of.col
1:     one     two     value.1  one
2:     uno     dos     value.2  dos
3:       1       2     value.1  1
FXQuantTrader
  • 6,821
  • 3
  • 36
  • 67
Nick Allen
  • 1,443
  • 1
  • 11
  • 29
  • Thanks for the replies so far. The answers definitely work, but are very memory intensive. Can anyone think of a good way to do this on a rather large data table? The table I am working with is 700k rows, 132 columns and about 700MB. – Nick Allen Jan 30 '14 at 20:33
  • does the second solution faster? – agstudy Jan 31 '14 at 01:22

3 Answers3

6

Another option:

d[ , value.of.col := diag(as.matrix(.SD)), .SDcols = d[ , name.of.col]]
> d
   value.1 value.2 name.of.col value.of.col
1:     one     two     value.1          one
2:     uno     dos     value.2          dos
3:       1       2     value.1            1

EDIT add a faster solution:

d[ , value.of.col :=
      melt(d,id.vars='name.of.col')[name.of.col==variable, value]]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 1
    +1 interesting use of column recycling and diag; you have a remarkable way of thinking about data; it would have never occurred to me to do it this way. – BrodieG Jan 30 '14 at 20:15
  • 1
    @BrodieG thanks. I add a new solution since the Op find the 2 solutions a little bit slow. – agstudy Jan 30 '14 at 20:52
2

You can use matrix indexing to pull values from the first and second columns:

mx.idx <- d[, cbind(1:nrow(d), match(name.of.col, names(d)))]
d[, 
  value.of.col:=
    as.matrix(d[, 1:2])[mx.idx]
 ]
d
#    value.1 value.2 name.of.col value.of.col
# 1:     one     two     value.1          one
# 2:     uno     dos     value.2          dos
# 3:       1       2     value.1            1
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
BrodieG
  • 51,669
  • 9
  • 93
  • 146
2

The following should be memory efficient and a little easier to read/follow.

for (i in unique(d[["name.of.col"]]))
    d[ name.of.col==i, value.of.col:=get(i) ]

d
   value.1 value.2 name.of.col value.of.col
1:     one     two     value.1          one
2:     uno     dos     value.2          dos
3:       1       2     value.1            1
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224