2

I have a data.table and I would like to apply a function (e.g. paste, sum) to some of the columns. For example:

library(data.table)
x = as.data.table(iris)
columnsSelected = c("Sepal.Width", "Sepal.Length")
x[, A := Sepal.Width + Sepal.Length] # this yields the desired result, but not in the way I want.
x[, A := sum(columnsSelected)] # this does not execute
x[, A := sum(get(columnsSelected))] # this does not yield the desired result
x[, A := sum(.SD), .SDcols = columnsSelected] # this does not yield the desired result
x[, A := do.call(sum, .SD), .SDcols = columnsSelected] # same as above

I am at a loss. Get() is suggested here, but it doesn't seem to work for multiple columns. Another interesting discussion about uses of data.table is here, but .SD and .SDcols only seem to work for aggregations.

Any help would be greatly appreciated.

Community
  • 1
  • 1
  • 1
    Could you describe what your desired result is? It's not clear what you mean by "this yields the desired result, but not in the way I want" – Allen Wang Oct 19 '16 at 20:54

1 Answers1

4

If you want an element wise plus, then just + instead of sum which sum all elements of the vectors together. Besides, since + accepts only two arguments, you can not use do.call with more than two columns, you will need Reduce() instead:

x[, A := Reduce("+", .SD), .SDcols = columnsSelected]

#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species   A
#   1:          5.1         3.5          1.4         0.2    setosa 8.6
#   2:          4.9         3.0          1.4         0.2    setosa 7.9
#   3:          4.7         3.2          1.3         0.2    setosa 7.9
#   4:          4.6         3.1          1.5         0.2    setosa 7.7
#   5:          5.0         3.6          1.4         0.2    setosa 8.6
# ---                                                                
# 146:          6.7         3.0          5.2         2.3 virginica 9.7
# 147:          6.3         2.5          5.0         1.9 virginica 8.8
# 148:          6.5         3.0          5.2         2.0 virginica 9.5
# 149:          6.2         3.4          5.4         2.3 virginica 9.6
# 150:          5.9         3.0          5.1         1.8 virginica 8.9

As commented by @42, we see how do.call() and Reduce() work with functions as + and paste() by extending columnsSelected variable to three columns:

columnsSelected = c("Sepal.Width", "Sepal.Length", "Petal.Length")

Where do.call("+", ...) gives an error as expected, since "+" is a binary operator and doesn't accept more than two arguments, while do.call() gives it three arguments (the three columns in the columnsSelected; Reduce("+", ...) works as expected since Reduce() add column by column instead of passing all columns to the function at once as do.call does. And for paste, do.call() and Reduce() both work since paste() can accept two or more vectors as arguments. And in this case do.call() is more efficient when you have a lot of columns but shouldn't matter here if you only pass in two or three columns.

x[, A := do.call(paste, .SD), .SDcols = columnsSelected]

#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species           A
#   1:          5.1         3.5          1.4         0.2    setosa 3.5 5.1 1.4
#   2:          4.9         3.0          1.4         0.2    setosa   3 4.9 1.4
#   3:          4.7         3.2          1.3         0.2    setosa 3.2 4.7 1.3
#   4:          4.6         3.1          1.5         0.2    setosa 3.1 4.6 1.5
#   5:          5.0         3.6          1.4         0.2    setosa   3.6 5 1.4
# ---                                                                        
# 146:          6.7         3.0          5.2         2.3 virginica   3 6.7 5.2
# 147:          6.3         2.5          5.0         1.9 virginica   2.5 6.3 5
# 148:          6.5         3.0          5.2         2.0 virginica   3 6.5 5.2
# 149:          6.2         3.4          5.4         2.3 virginica 3.4 6.2 5.4
# 150:          5.9         3.0          5.1         1.8 virginica   3 5.9 5.1

And x[, A := Reduce(paste, .SD), .SDcols = columnsSelected] gives the same result as do.call().

Update: To pass parameters to the function in do.call() and Reduce, do.call() flattens arguments in the list parameter and take them as if they are individual, so in order to pass parameter to the function in do.call(), we can concatenate the named parameter to the list parameter, i.e:

x[, A := do.call(paste, c(sep = ",", .SD)), .SDcols = columnsSelected]

#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species           A
#   1:          5.1         3.5          1.4         0.2    setosa 3.5,5.1,1.4
#   2:          4.9         3.0          1.4         0.2    setosa   3,4.9,1.4
#   3:          4.7         3.2          1.3         0.2    setosa 3.2,4.7,1.3
#   4:          4.6         3.1          1.5         0.2    setosa 3.1,4.6,1.5
#   5:          5.0         3.6          1.4         0.2    setosa   3.6,5,1.4
# ---                                                                        
# 146:          6.7         3.0          5.2         2.3 virginica   3,6.7,5.2
# 147:          6.3         2.5          5.0         1.9 virginica   2.5,6.3,5
# 148:          6.5         3.0          5.2         2.0 virginica   3,6.5,5.2
# 149:          6.2         3.4          5.4         2.3 virginica 3.4,6.2,5.4
# 150:          5.9         3.0          5.1         1.8 virginica   3,5.9,5.1

For Reduce(), on the other hand, it allows you to reconstruct an anonymous function, you can create a customized function with sep parameter specified:

x[, A := Reduce(function(x,y) paste(x,y,sep=","), .SD), .SDcols = columnsSelected]

Which gives the same result as do.call() again.

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • I wasn't sure this answered the question until I substituted `"paste"` for `"+"`. And also extended the argument to `.SDcols`. – IRTFM Oct 20 '16 at 00:23
  • @42 Right. Was trying to fix his code at the beginning. But the question is a little bit broad, when OP doesn't specify which function it is trying to use. – Psidom Oct 20 '16 at 02:03
  • Thank you very much for such a detailed reply! If I wanted to use `paste`, how could I pass an extra argument like `sep`? – GerasimosPanagiotakopoulos Oct 20 '16 at 04:04