34

I have the following data.table

x = structure(list(f1 = 1:3, f2 = 3:5), .Names = c("f1", "f2"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))

I would like to apply a function to each row of the data.table. The function func.test uses args f1 and f2 and does something with it and returns a computed value. Assume (as an example)

func.text <- function(arg1,arg2){ return(arg1 + exp(arg2))}

but my real function is more complex and does loops and all, but returns a computed value. What would be the best way to accomplish this?

broccoli
  • 4,738
  • 10
  • 42
  • 54

4 Answers4

53

The best way is to write a vectorized function, but if you can't, then perhaps this will do:

x[, func.text(f1, f2), by = seq_len(nrow(x))]
eddi
  • 49,088
  • 6
  • 104
  • 155
  • 1
    Ah, didn't think of using by = 1:nrow(x) trick. Nice one – broccoli Aug 21 '14 at 17:31
  • Not sure why not just use `.I`, e.g., something like `x[, func.text(f1, f2), by = .I]` – David Arenburg Aug 23 '14 at 20:19
  • 1
    @DavidArenburg I have no idea what `by=.I` is doing. It's somehow not quite the same as `by=1:nrow(x)`, as you can check by comparing e.g. `x[, 1, by = .I]` and `x[, 1, by = 1:nrow(x)]`. – eddi Aug 24 '14 at 05:29
  • would be great though if that worked as you'd expect it to work (also `by=1:.N`) – eddi Aug 24 '14 at 05:31
  • 2
    Yeah you probably right, but in this case it doesn't even look like the OP needs a `by` statement here, as his function already operates over the whole data set by row, so even `x[, func.text(f1, f2)]` will give the desired result. The problem will be that it will lose the `data.table` class and become a numeric vector. Adding `by = .I` will keep the class, but I'm not sure why or how (I'll probably will get some angry comment from @Arun pointing out my lack of understanding in `data.table` soon) – David Arenburg Aug 24 '14 at 10:27
  • Hmm.. Could you explain why writing vectorized function would be better than this? To me this looks very clean and easy. (cleaner and easier than vectorizing the function). – Pekka Jan 18 '15 at 17:38
  • @Pekka having a vectorized function would result in a lot fewer function calls and would be faster – eddi Jan 20 '15 at 15:52
  • This breaks when `x` has zero rows. – James Hirschorn Aug 17 '18 at 01:44
  • @JamesHirschorn if you have zero rows, then you're solving a different problem from "applying function to rows" – eddi Aug 17 '18 at 15:02
  • @eddi I disagree. I have production code that needs to apply a function to the rows of some given `data.table`, but the # of rows of the `data.table` is not known in advance and could potentially be zero. – James Hirschorn Aug 17 '18 at 16:10
  • @JamesHirschorn that doesn't change that it's a different problem, but I'm too lazy to keep arguing this minor point - new edit will be fine – eddi Aug 17 '18 at 18:01
30

The most elegant way I've found is with mapply:

x[, value := mapply(func.text, f1, f2)]
x
#    f1 f2    value
# 1:  1  3 21.08554
# 2:  2  4 56.59815
# 3:  3  5 151.4132

Or with the purrr package:

x[, value := purrr::pmap_dbl(.(f1, f2), func.text)]

If your situation allows for it, another approach would be to match the arguments names to the column names to use:

library("purrr")

# arguments match the names of the columns, dots collect other 
# columns existing in the data.table
func.text <- function(f1, f2, ...) { return(f1 + exp(f2)) }

# use `set` to modify the data.table by reference
purrr::pmap_dbl(x, func.text) %>%
  data.table::set(x, i = NULL, j = "value", value = .)

print(x)
##    f1 f2     value
## 1:  1  3  21.08554
## 2:  2  4  56.59815
## 3:  3  5 151.41316
mlegge
  • 6,763
  • 3
  • 40
  • 67
9

We can define rows with .I function.

dt_iris <- data.table(iris)
dt_iris[, ..I := .I]

## Let's define some function
some_fun <- function(dtX) {
    print('hello')
    return(dtX[, Sepal.Length / Sepal.Width])
}

## by row
dt_iris[, some_fun(.SD), by = ..I] # or simply: dt_iris[, some_fun(.SD), by = .I]

## vectorized calculation
some_fun(dt_iris) 
Cron Merdek
  • 1,084
  • 1
  • 14
  • 25
  • I am under the impression there was an age it was possible to directly use `by=.I` in the third component. No ? – Stéphane Laurent Feb 05 '16 at 02:07
  • @StéphaneLaurent sure, it is just to indicate that user sees the data, he applies `by` on. I have updated post to remove any doubt ;) – Cron Merdek Feb 05 '16 at 10:18
  • Sorry CronAcronis, maybe my comment is not clear. I mean it was possible to direclty do `dt[, y:=somefun(x), by=I]` in the past. But it is no possible now. Or maybe my memory is wrong. – Stéphane Laurent Feb 05 '16 at 12:31
  • @StéphaneLaurent I think you meant `.I`, so you can do `dt_iris[, some_fun(.SD), by = .I]`, with dot. – Cron Merdek Feb 05 '16 at 12:54
  • Yes sorry, I meant `.I`. But I tried it yesterday and it didn't work... Hmm I have just tried now and it works.. Sorry, I was surely too tired :) – Stéphane Laurent Feb 05 '16 at 14:43
  • What's the meaning of ..I ? – skan Nov 01 '19 at 13:07
  • @skan just for convenience to have actual counter persisted, no special meaning. – Cron Merdek Nov 07 '19 at 16:28
  • 1
    Note that `.I` is meant to be used as a `j` argument in `data.table`, and not in the `by` clause. In DT >1.12.4 it doesn't seem to work either. @CronMerdek, can you re-evaluate your answer? – Davor Josipovic Jun 15 '20 at 08:47
0

This is a pretty compact syntax

x[, c := .(Map(func.text, f1, f2))]
teemoleen
  • 118
  • 6