R Data.Table - APPLY/SHIFT processing by ROW (Retain or Dynamic Shift Processing)

Question

I would like to process data for each row. Let's say if we have collected "mpg" value for two "cyl" for 4 days. I would like to derive minimum mpg value relative to day)

Original Data

** day,cyl,mpg**

1,4,34.4
2,4,21.3
3,4,23.3
4,4,25.0
1,3,23.0
2,3,27.0
3,3,18.3
4,3,17.3

Expected Output

** day,cyl,mpg,min_mpg**

1,4,34.4,34.4
2,4,21.3,21.3
3,4,23.3,21.3
4,4,25.0,21.3
1,3,23.0,23.0
2,3,27.0,23.0
3,3,18.3,18.3
4,3,17.3,17.3

I have given a few thoughts as below...

For Loop processing (which is really not most efficient options)
APPLY and SHIFT function (retain minimum value from the previous row processing in a global variable and reset it to NA for each GROUP. I was unsuccessful to retain minimum mpg value into a global variable)
APPLY and SHIFT function (shift "-1" all the way up to row #1 for each row. Kind of putting a loop in APPLY function. This option might be doing lot more redundant processing)

I tried to use rowShift function as described in the below blog but my requirement is that I need to shift dynamically Use a value from the previous row in an R data.table calculation

Is there any "vectorized" option available? OR Traditional FOR LOOP will be the only option? I prefer option using base R (either data frame or data table)

score 1 · Accepted Answer · answered Sep 29 '19 at 22:30

We can use cummin

library(dplyr)
df1 %>% 
   group_by(cyl) %>% 
   mutate(min_mpg = cummin(mpg))
# A tibble: 8 x 4
# Groups:   cyl [2]
#    day   cyl   mpg min_mpg
#  <int> <int> <dbl>   <dbl>
#1     1     4  34.4    34.4
#2     2     4  21.3    21.3
#3     3     4  23.3    21.3
#4     4     4  25      21.3
#5     1     3  23      23  
#6     2     3  27      23  
#7     3     3  18.3    18.3
#8     4     3  17.3    17.3

In base R, we can use ave

df1$min_mpg <- with(df1, ave(mpg, cyl, FUN = cummin))

Or using data.table

library(data.table)
setDT(df1)[, min_mpg := cummin(mpg), by = cyl][]

data

df1 <- structure(list(day = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), cyl = c(4L, 
4L, 4L, 4L, 3L, 3L, 3L, 3L), mpg = c(34.4, 21.3, 23.3, 25, 23, 
27, 18.3, 17.3)), class = "data.frame", row.names = c(NA, -8L
))

R Data.Table - APPLY/SHIFT processing by ROW (Retain or Dynamic Shift Processing)

Original Data

Expected Output

1 Answers1

data