Calculating cumulative time per event

Question

This is a small sample from the df I am working with:

    animal  time period zone   
   <chr>  <dbl> <chr>  <chr>    
 1 HR1     0    q1     social    
 2 HR1     2.04 q1     social    
 3 HR1     2.1  q1     social    
 4 HR1     2.20 q1     interzone 
 5 HR1     2.44 q1     nest      
 6 HR1     2.47 q1     nest      
 7 HR1     2.68 q1     nest      
 8 HR1     2.71 q1     nest      
 9 HR1     2.87 q1     nest      
10 HR1     3.20 q1     nest

I am trying to find a way to calculate how much time an animal spent in each zone. I would effectively like to generate another column where I get cumulative time for each of the visits to the zone. It would include taking max and minimum time point in the zone, subtracting them, and placing the end result into a new column. The end result should look something like this:

   animal  time period  zone   cum_time_zone
   <chr>  <dbl> <chr>  <chr>    <dbl>
 1 HR1     0    q1     social    NA
 2 HR1     2.04 q1     social    NA
 3 HR1     2.1  q1     social    2.2
 4 HR1     2.20 q1     interzone 0.24
 5 HR1     2.44 q1     nest      NA
 6 HR1     2.47 q1     nest      NA
 7 HR1     2.68 q1     nest      NA
 8 HR1     2.71 q1     nest      NA
 9 HR1     2.87 q1     nest      NA 
10 HR1     3.20 q1     nest      0.76

I will be very grateful for any input!

part of your question (the cumulative one) goes back to this question: https://stackoverflow.com/questions/18925600/r-cumulative-sum-by-condition. You can maybe read it and try it out for your case, then come back to clarify what is yet unclear — NicolasH2, Jun 08 '21 at 15:12
Hey Waldi, thanks for your comment - you are absolutely right, it shouldn't be 0.1 - I now corrected it. — Alja Podgornik, Jun 08 '21 at 15:19

Waldi · Answer 1 · 2021-06-08T15:44:45.457

With data.table:

library(data.table)

setDT(data)

data[,cum_time_zone := c(0,diff(time))][
     ,cum_time_zone:=fifelse(time==max(time),sum(cum_time_zone),NA_real_),by=rleid(animal,zone)]
data

    animal time period      zone cum_time_zone
 1:    HR1 0.00     q1    social            NA
 2:    HR1 2.04     q1    social            NA
 3:    HR1 2.10     q1    social           2.1
 4:    HR1 2.20     q1 interzone           0.1
 5:    HR1 2.44     q1      nest            NA
 6:    HR1 2.47     q1      nest            NA
 7:    HR1 2.68     q1      nest            NA
 8:    HR1 2.71     q1      nest            NA
 9:    HR1 2.87     q1      nest            NA
10:    HR1 3.20     q1      nest           1.0

Data:

data <- read.table(text="
      animal  time period zone   

1 HR1     0    q1     social    
2 HR1     2.04 q1     social    
3 HR1     2.1  q1     social    
4 HR1     2.20 q1     interzone 
5 HR1     2.44 q1     nest      
6 HR1     2.47 q1     nest      
7 HR1     2.68 q1     nest      
8 HR1     2.71 q1     nest      
9 HR1     2.87 q1     nest      
10 HR1    3.20 q1     nest  ",header=T)

score 0 · Answer 2 · answered Jun 08 '21 at 14:37

Here is one dplyr approach -

library(dplyr)

df %>%
  group_by(animal) %>%
  mutate(cum_time_zone = time - lag(time, default = 0)) %>%
  group_by(zone, .add = TRUE) %>%
  mutate(cum_time_zone = case_when(n() == 1 ~ cum_time_zone, 
                               row_number() == n()~ max(time) - min(time))) %>%
  ungroup()

#  animal  time period zone      cum_time_zone
#   <chr>  <dbl> <chr>  <chr>             <dbl>
# 1 HR1     0    q1     social           NA    
# 2 HR1     2.04 q1     social           NA    
# 3 HR1     2.1  q1     social            2.1  
# 4 HR1     2.2  q1     interzone         0.100
# 5 HR1     2.44 q1     nest             NA    
# 6 HR1     2.47 q1     nest             NA    
# 7 HR1     2.68 q1     nest             NA    
# 8 HR1     2.71 q1     nest             NA    
# 9 HR1     2.87 q1     nest             NA    
#10 HR1     3.2  q1     nest              0.76

data

df <- structure(list(animal = c("HR1", "HR1", "HR1", "HR1", "HR1", 
"HR1", "HR1", "HR1", "HR1", "HR1"), time = c(0, 2.04, 2.1, 2.2, 
2.44, 2.47, 2.68, 2.71, 2.87, 3.2), period = c("q1", "q1", "q1", 
"q1", "q1", "q1", "q1", "q1", "q1", "q1"), zone = c("social", 
"social", "social", "interzone", "nest", "nest", "nest", "nest", 
"nest", "nest")), class = "data.frame", row.names = c(NA, -10L))

This unfortunately doesn't seem to work for me - I get a whole column of just NAs :( — Alja Podgornik, Jun 08 '21 at 15:11
I have added data in my post using which you can test my answer. If it doesn't work on your real data can you provide a reproducible subset of data using `dput(head(df))`. — Ronak Shah, Jun 09 '21 at 01:16

Calculating cumulative time per event

2 Answers2