1

When writing functions, the following function would work if it is given a data.table by name:

myDelta <- function(DT, col.a = "Sepal.Length", baseline = 5){
  DT[, delta := get(col.a) - baseline]
  return(DT[])
}

It could be called like this:

library(data.table)
irisDT <- data.table(iris)
myDelta(irisDT)

However this has a few problems:

  1. Assigning the output to a new object will work, but the original is modified in place, so this can be an awkward side effect
  2. I don't assume (though I haven't tested) that this is using the best of all of data.tables fancy fastness
  3. This is not using the 'data.table way', which would be more irisDT[, myDelta()]but because it expects a DT argument which is a data.table, I am repeating myself by writing irisDT[, myDelta(irisDT)].

Explicitly, I would like to know: What I am missing about writing functions which allows them to inherit from the data.table object they are called in without the data.table object having to be provided from the function arguments

Additionally I am curious about: What best practice would be for writing a function which can be called from inside, or outside, a data.table object in this kind of use case, where the goal is to calculate an output column from existing columns in the object. Do you write for just one or the other?

I may have this entirely backwards though, if so please let me know.

DaveRGP
  • 1,430
  • 15
  • 34
  • I don't understand what you are actually asking. What is your goal? Have your function work but not change the input? – Roland Nov 14 '16 at 15:35
  • My function works as I expect it to, however, I feel I have written it badly. My understanding of data.table is that you call functions to modify the table within the j argument of `[i, j, by, etc....]`. For instance I can call sum on the column `Sepal.Width` via `IrisDT[, sum(Sepal.Width)`, without writing the table name within `sum()`, however, for my function above that does not work. `irisDT[, myDelta()]` gives `Error in myDelta() : argument "DT" is missing, with no default`. I would like my function to not have to re-specify the table name when used in a data.table. – DaveRGP Nov 14 '16 at 15:57
  • 1
    Note how `sum` does not assign a column. You can't use `:=` (or `set`) inside a function without referencing the data.table. – Roland Nov 14 '16 at 16:12
  • That's a good observation, that I had missed. Thank you. I'm still curious as to if I can achieve the functionality I would like and further enhance my function. – DaveRGP Nov 14 '16 at 16:24
  • can you provide an expected output of what you are trying to achieve... Nevertheless, is this what you are asking? `irisDT[, .SD - 5, .SDcols = "Sepal.Length"]` or in place `irisDT[, Sepal.Length.Baseline := .SD - 5, .SDcols = "Sepal.Length"]` – John Smith Nov 14 '16 at 20:31
  • also basic operations can be done as such `irisDT[, .SD - 5, .SDcols = c("Sepal.Length", "Sepal.Width")]` or `irisDT[, c(paste0("delta", c("Sepal.Length", "Sepal.Width"))) := .SD - 5, .SDcols = c("Sepal.Length", "Sepal.Width")]` – John Smith Nov 14 '16 at 20:48

1 Answers1

1

You apply a function on a subset of the data.table selected by [i, j, by, .SDcols]. Example:

myDelta2 <- function(x, baseline = 5) {
  return(x - 5)
}

library(data.table)
irisDT <- data.table(iris)
irisDT[, lapply(.SD, myDelta2), .SDcols = c("Sepal.Length", "Sepal.Width")]

Of course this can be simply be written also as:

irisDT[, .SD - 5, .SDcols = c("Sepal.Length", "Sepal.Width")]

or inplace

irisDT[, c(paste0("delta", c("Sepal.Length", "Sepal.Width"))) := .SD - 5, .SDcols = c("Sepal.Length", "Sepal.Width")]

Suggest you check out this excellent resource

PS: if you are wondering about .SD then read this

Community
  • 1
  • 1
John Smith
  • 1,077
  • 6
  • 21