how to calculate differences between the row with minimum value and the subsequent rows after sorting them according to time in R?

Question

I have a dataset as given by the following code. I am looking for the difference between the row with lowest value before it rises and the subsequent values for each ID after arranging them by time.

df <- data.frame(ID = c(1,1,1,1,1,2,2,2,3,3,3,3,3,3), time=c(6,12,18,24,30,3,9,21,6,12,18,24,30,36), value = c(0.9,0.7,2.8,3.8,0.5,1.3,3.1,0.8,1.2,0.6,3.7,1.8,0.9,0.3))

So for ID 1, I would like to find the difference between 0.7 and subsequent rows. The overall data that I want is this

df1 <- df%>%mutate(value.diff = c(NA, 0, 2.1, 3.1, -0.2, 0, 1.8, 0.5, NA, 0, 3.1, 1.2, 0.3, -0.3))

I applied the following code

df <- df[order(df$ID,df$time),]
df <- df%>%group_by(ID)%>%mutate(value.diff = diff(value-min(value)))

But it is not serving the purpose. I would appreciate any help in this regard.

All of your min values are the last value in each group, so you'll have no differences. — r2evans, Mar 21 '21 at 23:18
@r2evans I see....but how can I avoid this and pick up the pre-rise minimum and then calculate difference from the subsequent rows? — Biostats, Mar 21 '21 at 23:24
Are you referring to the first _local minimum_? If so, use that as search term to get started. Also please update your data with the desired result. — Henrik, Mar 21 '21 at 23:29
Your question specifically mentioned the difference between the row with the lowest `value` and *subsequent* values. I'm assuming that before the minimum, you'd have `NA`. Perhaps you could provide the expected output given this sample data. — r2evans, Mar 21 '21 at 23:32
@Henrik, I have provided the resultant dataset that I am looking forward to...I am very new to these rolling row operations especially when conditioned to other variables in the dataset and applying the concept of local minima or maxima in such situation...would really appreciate any help...thanks — Biostats, Mar 21 '21 at 23:38
@r2evans I have provided the resultant dataset that I am looking forward to... — Biostats, Mar 21 '21 at 23:38

r2evans · Accepted Answer · 2021-03-22T13:12:40.530

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(
    ind = which.max(c(diff(value) > 0, TRUE)),
    value.diff = replace(value - value[ind[1]], row_number() < ind, NA_real_)
  ) %>%
  ungroup() %>%
  select(-ind)
# # A tibble: 14 x 4
#       ID  time value value.diff
#    <dbl> <dbl> <dbl>      <dbl>
#  1     1     6   0.9     NA    
#  2     1    12   0.7      0    
#  3     1    18   2.8      2.10 
#  4     1    24   3.8      3.10 
#  5     1    30   0.5     -0.200
#  6     2     3   1.3      0    
#  7     2     9   3.1      1.8  
#  8     2    21   0.8     -0.5  
#  9     3     6   1.2     NA    
# 10     3    12   0.6      0    
# 11     3    18   3.7      3.1  
# 12     3    24   1.8      1.2  
# 13     3    30   0.9      0.3  
# 14     3    36   0.3     -0.3

Explanation:

ind indicates which row (per ID) has the first increase following it; this will be the same value for all rows within a group, so it is a little inefficient in that regards, but it's useful;
value[ind[1]] is the value of the first low-point; I chose to use value[ind[1]] since we only need one of the ind indices to get one value of value, but value[ind] would have worked just as well;
in replace(.,.,.), the first is the default value returned, value-value[ind[1]]; the second is a conditional indicating an exception to the default, in this case "rows before the low point"; the third is the replacement value, NA_real_. I could have used NA just as easily, I often prefer being declarative for which version of NA I expect; some tools like dplyr::if_else and data.table::fifelse will error if the class of both yes/no objects are not perfectly identical, and since class(NA) is logical, I chose the perfect match. (There are more than six types of NA, fyi.) This is not strictly required in replace. Another fyi, I often prefer replace since it is more length-safe, and does not obliterate class when not simple (see How to prevent ifelse() from turning Date objects into numeric objects).

thank you very much...could you please explain this part of the code "value.diff = replace(value - value[ind[1]], row_number() < ind, NA_real_))"?? — Biostats, Mar 22 '21 at 01:49

how to calculate differences between the row with minimum value and the subsequent rows after sorting them according to time in R?

1 Answers1