4

Imagine you want to apply a function row-wise on a data.table. The function's arguments correspond to fixed data.table columns as well as dynamically generated column names.

Is there a way to supply fixed and dynamic column names as argument to a function while using data.tables?

The problems are:

  • Both, variablenames and dynamically generated strings as argument to a function over a datatable
  • The dynamic column name strings are stored in a vector with > 1 entries (get() won't work)
  • The dynamic column's values need to be supplied as a vector to the function

This illustrates it:

library('data.table')
# Sample dataframe
D <- data.table(id=1:3, fix=1:3, dyn1=1:3, dyn2=1:3) #fixed and dynamic column names
setkey(D, id)
# Sample function
foo <-function(fix, dynvector){ rep(fix,length(dynvector)) %*% dynvector}
# It does not matter what this function does.

# The result when passing column names not dynamically
D[, "new" := foo(fix,c(dyn1,dyn2)), by=id]
#    id fix dyn1 dyn2 new
# 1:  1   1    1    1   2
# 2:  2   2    2    2   8
# 3:  3   3    3    3  18

I want to get rid of the c(dyn1,dyn2). I need to get the column names dyn1, dyn2 from another vector which holds them as string.

This is how far I got:

# Now we try it dynamically
cn <-paste("dyn",1:2,sep="")   #vector holding column names "dyn1", "dyn2"

# Approaches that don't work
D[, "new" := foo(fix,c(cn)), by=id]            #wrong as using a mere string
D[, "new" := foo(fix,c(cn)), by=id, with=F]    #does not work
D[, "new" := foo(fix,c(get(cn))), by=id]       #uses only the first element "dyn1"
D[, "new" := foo(fix,c(mget(cn, .GlobalEnv, inherits=T))), by=id]       #does not work
D[, "new" := foo(fix,c(.SD)), by=id, .SDcols=cn]       #does not work

I suppose mget() is the solution, but I know too less about scoping to figure it out.

Thanks! JBJ


Update: Solution

based on the answer by BondedDust

    D[, "new" := foo(fix,sapply(cn, function(x) {get(x)})), by=id]
JBJ
  • 866
  • 9
  • 21
  • BTW, your call to setkey throws an error. – IRTFM May 10 '14 at 19:23
  • Does this answer your question? [How can one work fully generically in data.table in R with column names in variables](https://stackoverflow.com/questions/24833247/how-can-one-work-fully-generically-in-data-table-in-r-with-column-names-in-varia) – jangorecki Nov 21 '20 at 15:04

1 Answers1

1

I wasn't able to figure out what you were trying to do with the matrix-multiplication, but this shows how to create new variables with varying and fixed inputs to a function:

D <- data.table(id=1:3, fix=1:3, dyn1=1:3, dyn2=1:3) 
setkey(id)

foo <-function(fix, dynvector){ fix* dynvector}
D[, paste("new",1:2,sep="_") := lapply( c(dyn1,dyn2), foo, fix=fix), by=id]
#----------
> D
   id fix dyn1 dyn2 new_1 new_2
1:  1   1    1    1     1     1
2:  2   2    2    2     4     4
3:  3   3    3    3     9     9

So you need to use a vector of character values to get columns. This is a bit of an extension to this question: Why do I need to wrap `get` in a dummy function within a J `lapply` call?

> D <- data.table(id=1:3, fix=1:3, dyn1=1:3, dyn2=1:3) 
> setkey(D, id)
> id1 <- parse(text=cn)
> foo <-function( fix, dynvector){  fix*dynvector}
> D[, paste("new",1:2,sep="_") := lapply( sapply( cn, function(x) {get(x)}) , foo, fix=fix) ]
Warning message:
In `[.data.table`(D, , `:=`(paste("new", 1:2, sep = "_"), lapply(sapply(cn,  :
  Supplied 2 columns to be assigned a list (length 6) of values (4 unused)
> D
   id fix dyn1 dyn2 new_1 new_2
1:  1   1    1    1     1     2
2:  2   2    2    2     2     4
3:  3   3    3    3     3     6

You could probably use the methods in create an expression from a function for data.table to eval as well.

Community
  • 1
  • 1
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Hi, thanks for the effort, but that does not answer my question. The function `foo(fix,dynvector)` is irrelevant and serves only to illustrate the problem. (What I want to do with the function is much more complex). The main problem is a function with two arguments, one is a vector, where this vector needs to be created dynamically from multiple variables in a data.table. The output of this function is a scalar. Your solution uses the names of the columns (dyn1, dyn2) as input, instead of a vector holding the names of the columns as strings (<"dyn1","dyn2">). – JBJ May 10 '14 at 18:40
  • Ah wrapping solves it. The solution is not 100 % in your answer, but closely based on it. What do I do regarding accepting your answer? Do I now also accept your answer as solution? – JBJ May 10 '14 at 20:39
  • That's up to you. Clearly you should vote it as useful, but if there is a better answer, you can post it and checkmark your own answer. – IRTFM May 10 '14 at 21:11