1

Note that in this question I want a column of functions, so it's not a duplicate of the more common question "how do I create a new column using a function?".

I'll demonstrate with a simple working example. The func column is a list of functions.

test <- data.table(n = c(1, 1, 2, 2), func = c(min, min, max, max))

The syntax to actually use the function is a little unexpected to me but I can get it to work like this:

test[1, func][[1]](c(1,2)) # the unexpected part being that I have to include [[1]]

Now, let's create a larger data.table, and add the goal that I want the func column to be based on the n column. Using data.table::fcase results in an error:

test3 <- data.table(n = c(1, 1, 2, 2))
test3[, func := fcase(n == 1, min,
                      n == 2, max)]
# Error in fcase(n == 1, min, n == 2, max) : 
#  invalid type/length (builtin/4) in vector allocation

Why does this method fail? And how can I update the assignment of func to be based on column n? I'm curious if there is a data.table solution to this.

Henrik
  • 65,555
  • 14
  • 143
  • 159
gabagool
  • 640
  • 1
  • 7
  • 18
  • 3
    `test3[, func := c(min, max)[n]]` – jblood94 May 03 '23 at 00:47
  • Thanks @jblood94. Not sure which solution I'll go with but I love the addition – gabagool May 03 '23 at 15:39
  • 1
    I doubt performance is a concern here, but generally, subsetting operations will be faster than `fcase` or `ifelse`, and often even joins (as in this case). Also, it is easy to generalize with `match` (e.g., `test3[, func := fun[match(n, key)]]`, where `key` has a 1-to-1 correspondence with `fun`). – jblood94 May 03 '23 at 16:30

2 Answers2

3
  1. From your statement "I have to include [[1]]": because func is a list-column, even extracting one row is the same as list(1, 2, 3)[2], it still returns a list. Your issue is the difference in general between [ and [[ (see The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe, Dynamically select data frame columns using $ and a character value).

  2. Related to #1, we need to return lists.

    test3[, func := fcase(n == 1, list(min), n == 2, list(max))]
    test3
    #        n          func
    #    <num>        <list>
    # 1:     1 <function[1]>
    # 2:     1 <function[1]>
    # 3:     2 <function[1]>
    # 4:     2 <function[1]>
    

    This can be done from test as well with a join, suggestion by @Henrik:

    test3 <- data.table(n = c(1, 1, 2, 2))
    test3[test, func := i.func, on = .(n)]
    
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 2
    Not a direct answer to OP, but a join may also be used (`test3[.(n = 1:2, func = c(min, max)), on = .(n), func := func]`; which is faster on larger data, if speed ever would be an issue). – Henrik May 02 '23 at 21:46
  • 1
    @Henrik, I had thought about it but wasn't entirely sure that's what the OP was gearing towards, but on reading it again that makes sense. – r2evans May 02 '23 at 21:49
  • 1
    Thanks to all of you who have responded. As usual, my actual use case is more complicated, so the multiple solutions is really helpful. @Henrik – gabagool May 03 '23 at 15:38
0

Perhaps redundant, not sure what the complexity and the amount of functions is that the OP wants to use. I am interested to know some sort scenario on how OP wants to use the functions, it looks like some sort of lookup table where one want to match n and then run the corresponding function? If so, it is not really needed to store them as function, you can call that function just by its name instead when needed. Nevertheless here my solution.

dt[, func := factor(n, levels = 1:2, labels = c("min", "max"))]
dt[, func := apply(.SD, 1, get), .SDcols = "func"]

dt

#    n          func
# 1: 1 <function[1]>
# 2: 1 <function[1]>
# 3: 2 <function[1]>
# 4: 2 <function[1]>
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22