5

I am trying to move some of my slower processes in dplyr to using data.table, however can not seem to find an efficient way of using a "mutate_at" type approach in data.table. Especially, when it comes to naming the new variables created & applying more than 1 function to multiple columns.

Below I use mutate_at to apply 2 different functions to 2 different columns with associated naming + using a group by statement. I want to be able to replicate this easily in data.table.

library(tibble)
library(zoo)

Data = tibble(A = rep(c(1,2),50),
              B = 1:100,
              C = 101:200)

Data %>% 
    group_by(A) %>% 
    mutate_at(vars(B,C), funs(Roll.Mean.Week = 7 * rollapply(., width = 7, mean, align = "right", fill = 0, na.rm = T, partial = T),
                              Roll.Mean.Two.Week = 7 * rollapply(., width = 14, mean, align = "right", fill = 0, na.rm = T, partial = T))) %>% 
    ungroup()
Joris C.
  • 5,721
  • 3
  • 12
  • 27
JFG123
  • 577
  • 5
  • 13

2 Answers2

4

With data.table, we can specify the columns of interest in .SDcols, loop through the .SD with lapply and apply the function of interest. Here, the funcion rollapply is repeated with only change in width parameter. So, it may be better to create a function to avoid repeating the whole arguments. Also, while applying the function (f1), the output can be kept in a list, later unlist with recursive = FALSE and assign (:=) to columns of interest

library(data.table)
library(zoo)
nm1 <- c("B", "C")
nm2 <- paste0(nm1, "_Roll.Mean.Week")
nm3 <- paste0(nm1, "_Roll.Mean.Two.Week")
f1 <- function(x, width) rollapply(x, width = width, mean,
        align = "right", fill = 0, na.rm = TRUE, partial = TRUE)
setDT(Data)[, c(nm2, nm3) := unlist(lapply(.SD, function(x)
  list(f1(x, 7), f1(x, 14))), recursive = FALSE), by = A, .SDcols = nm1]
head(Data)
#   A B   C B_Roll.Mean.Week C_Roll.Mean.Week B_Roll.Mean.Two.Week C_Roll.Mean.Two.Week
#1: 1 1 101                1                1                  101                  101
#2: 2 2 102                2                2                  102                  102
#3: 1 3 103                2                2                  102                  102
#4: 2 4 104                3                3                  103                  103
#5: 1 5 105                3                3                  103                  103
#6: 2 6 106                4                4                  104                  104

Note that funs is deprecated in tidyverse and in its place, can use list(~ or just ~

Data %>% 
    group_by(A) %>% 
    mutate_at(vars(B,C), list(Roll.Mean.Week =  ~f1(., 7),
                              Roll.Mean.Two.Week = ~ f1(., 14)))%>% 
    ungroup()
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This solution does not satisfy the request to have the new column names added to the original tibble, as mutate_at does. Noted regarding the deprecated funs. – JFG123 Aug 07 '19 at 03:34
  • @JackFahey-Gilmour. For that it is just `:=` (updated the answer) – akrun Aug 07 '19 at 03:35
  • what about the ability to easily apply two function calls to the same set of variables, as specified above. – JFG123 Aug 07 '19 at 03:36
  • @JackFahey-Gilmour. Have you checked my update. I thought it is the same repeating stuff, so I didn't earlier update – akrun Aug 07 '19 at 03:40
  • Is there anyway to avoid multiple chains? Or do you have to keep chaining the functions in seperate calls? – JFG123 Aug 07 '19 at 03:42
  • @JackFahey-Gilmour. You can do it single chain itself, but it is more clear this way – akrun Aug 07 '19 at 03:43
  • Is it possible to provide both solutions? I think that would provide a well rounded answer. – JFG123 Aug 07 '19 at 03:45
  • @JackFahey-Gilmour. Updated the answer – akrun Aug 07 '19 at 03:45
  • I just realised that the allocation of column names is not in order. It should be, B_Roll.Mean.Week, B_Roll.Mean.Two.Week, C_Roll.Mean.Week, C_Roll.Mean.Two.Week – JFG123 Aug 07 '19 at 04:58
  • @JackFahey-Gilmour. You can change the order with `setcolorder(Data, yourorder)` – akrun Aug 07 '19 at 13:03
1

And also from Apply multiple functions to multiple columns in data.table

f <- function(x) lapply(c(Roll.Mean.Week=7L, Roll.Mean.Two.Week=14L),
    function(n) rollapplyr(x, width=n, mean, fill=0L, na.rm=TRUE, partial=TRUE))
DT[, unlist(lapply(.SD, f), recursive=FALSE), A, .SDcols=B:C]

output:

     A B._Roll.Mean.Week B._Roll.Mean.Two.Week C._Roll.Mean.Week C._Roll.Mean.Two.Week
  1: 1                 1                     1               101                   101
  2: 1                 2                     2               102                   102
  3: 1                 3                     3               103                   103
  4: 1                 4                     4               104                   104
  5: 1                 5                     5               105                   105
  6: 1                 6                     6               106                   106
  7: 1                 7                     7               107                   107
  8: 1                 9                     8               109                   108
  9: 1                11                     9               111                   109
 10: 1                13                    10               113                   110
 [trunc...]
chinsoon12
  • 25,005
  • 4
  • 25
  • 35