Calculations for ordered observations using mutate() based on conditional statements

Question

I am trying to calculate growth rates between the specific rows after grouping by another variable (similar to other users).

Here is a sample of my data:

squirrel_id    wt   age    trialdate 
   22639      9.7     0    2017-04-20
   22639      45.9   24    2017-05-14
   22639     130     53    2017-06-12 #caught 3x, 1 trial
   22640     10.3    0     2017-04-20
   22640     49.2    24    2017-05-14
   22640     121     52    2017-06-11
   22640     196     84    2017-07-13 #caught 4x, 2 trials
   23943     12.9     1    2018-04-27
   23943     57.2    26    2018-05-23 #caught 2x, 1 trial
   23760     150     73    2018-06-18
   23760     165     84    2018-06-29 #caught 2x, 2 trials

To get this data in this form, I made sure to arrange_by(squirrel_id) first using the library(dplyr) package.

What I am trying to do is calculate the growth rate between:

last observation and second last observation divided by time that has passed (last_wt-second_last_wt/last_age-second_last_age): [using squirrel_id 22640: (196-121)/(84-52)] and then add a column called “trial” that says “2”
second last observation and third last observation divided by time that has passed (second_last_wt-third_last_wt/second_last_age-third_last_age): [using squirrel_id 22640: (121-49.2)/(52-24)] and then add a column called “trial” that says “1”

There is a catch though:

if a squirrel_id is seen 3 times or less in total (like squirrel_id 22639 and 23943), then they would usually only have had 1 trial and therefore 1 growth rate calculation.
BUT, if 2 of the observations are with an age > 40 days old (like squirrel_id 23760), then they had 2 trials.

My hope is to have a final dataset that looks like:

squirrel_id    wt   age    trialdate    g.rate     trial
   22639      9.7     0    2017-04-20   NA         NA
   22639      45.9   24    2017-05-14   NA         NA
   22639     130     53    2017-06-12   3.0        1     #caught 3x, 1 trial
   22640     10.3    0     2017-04-20   NA         NA
   22640     49.2    24    2017-05-14   NA         NA
   22640     121     52    2017-06-11   2.6         1
   22640     196     84    2017-07-13   2.3         2     #caught 4x, 2 trials
   23943     12.9     1    2018-04-27   NA         NA
   23943     57.2    26    2018-05-23   1.7         1     #caught 2x, 1 trial
   23760     150     73    2018-06-18    NA        1      
   23760     165     84    2018-06-29    1.4        2     #caught 2x, 2 trials

I’d prefer a dplyr() solution, if possible.

score 2 · Accepted Answer · answered May 28 '19 at 15:05

Here is one way of doing it, using dplyr, assuming your data is df as set out above...

library(dplyr)
df %>% 
  arrange(squirrel_id, age) %>%                        #sort by id and age
  group_by(squirrel_id) %>%                            #group by id
  mutate(g.rate = c(NA, diff(wt) / diff(age)),         #calculate g.rate
         trial = row_number() - n() + 2                #counting up to 2 at end
                 - (n() <= 3)                          #-1 if 3 or fewer in group
                 + (n() <= 3 & sort(-age)[2] <= -40),  #+1 if also both age>40
         trial = ifelse(trial<1, NA, trial),           #set to NA if less than 1
         g.rate = ifelse(is.na(trial), NA, g.rate))    #set to NA if trial is NA

   squirrel_id    wt   age trialdate  g.rate trial
 1       22639   9.7     0 2017-04-20  NA       NA
 2       22639  45.9    24 2017-05-14  NA       NA
 3       22639 130      53 2017-06-12   2.9      1
 4       22640  10.3     0 2017-04-20  NA       NA
 5       22640  49.2    24 2017-05-14  NA       NA
 6       22640 121      52 2017-06-11   2.56     1
 7       22640 196      84 2017-07-13   2.34     2
 8       23760 150      73 2018-06-18  NA        1
 9       23760 165      84 2018-06-29   1.36     2
10       23943  12.9     1 2018-04-27  NA       NA
11       23943  57.2    26 2018-05-23   1.77     1

Note that the last two ids are in a different order than your data. You can avoid this by just doing arrange(age) in the first step.

score 2 · Answer 2 · answered May 28 '19 at 23:58

Here is another approach using tidyverse:

library(tidyverse)

df %>%
  arrange(squirrel_id, age) %>%
  group_by(squirrel_id) %>%
  mutate(trial = case_when(all(tail(age, 2) > 40) ~ c(rep(NA, n() - 2), 1, 2),
                           TRUE ~ c(rep(NA, n() - 1), 1)),
         g.rate = replace((wt -lag(wt)) / (age - lag(age)), 
                           1:(match(1, trial) - 1), NA)) %>%
  ungroup()
#> # A tibble: 11 x 6
#>    squirrel_id    wt   age trialdate  trial g.rate
#>          <dbl> <dbl> <dbl> <chr>      <dbl>  <dbl>
#>  1       22639   9.7     0 2017-04-20    NA  NA   
#>  2       22639  45.9    24 2017-05-14    NA  NA   
#>  3       22639 130      53 2017-06-12     1   2.9 
#>  4       22640  10.3     0 2017-04-20    NA  NA   
#>  5       22640  49.2    24 2017-05-14    NA  NA   
#>  6       22640 121      52 2017-06-11     1   2.56
#>  7       22640 196      84 2017-07-13     2   2.34
#>  8       23760 150      73 2018-06-18     1  NA   
#>  9       23760 165      84 2018-06-29     2   1.36
#> 10       23943  12.9     1 2018-04-27    NA  NA   
#> 11       23943  57.2    26 2018-05-23     1   1.77

Calculations for ordered observations using mutate() based on conditional statements

2 Answers2