3

I would like to use the DT[, lapply(.SD, func), by=group, .SDcols=cols] syntax in a data.table, but I would like to pass a column of DT to func(). Is there a way to get this to work? For example,

indexfunc <- function(col, indexcol, indexvalue)
  col/col[indexcol==indexvalue]

DT <- data.table(group=c('A','A','B','B'), indexkey=c(1,2,1,2), value=1:4)

# Works
DT[, indexfunc(value, indexkey, 2), by=group]

# Fails, Error in indexfunc(value, indexkey, 2) : object 'indexkey' not found
DT[, lapply(.SD, indexfunc, indexkey, 2), by=group, .SDcols=c("value")]
Abiel
  • 5,251
  • 9
  • 54
  • 74
  • Seems like this may be related: http://stackoverflow.com/questions/24152199/r-data-table-using-lapply-on-functions-defined-outside – MrFlick May 06 '15 at 02:38
  • This is a known bug: https://github.com/Rdatatable/data.table/issues/495 – MichaelChirico May 06 '15 at 16:05
  • Dupe of this one: http://stackoverflow.com/questions/27755518/data-table-sd-lapply-multiple-columns-in-argument Don't know if it's worth marking as such. – Frank May 06 '15 at 16:17

1 Answers1

3

I think the strategy here necessarily entails bad programming, but

DT[,lapply(
  .SD[,"value"], 
  indexfunc,indexcol= indexkey,indexvalue= 2
), by=group]

gives the output

   group value
1:     A  0.50
2:     A  1.00
3:     B  0.75
4:     B  1.00

The approach in the OP didn't work because .SDcols restricts the set of columns available in j of DT[i,j]. I think the arguments to the function used in lapply must also be named.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Frank
  • 66,179
  • 8
  • 96
  • 180