0

This is a small sample from the df I am working with:

    animal  time period zone   
   <chr>  <dbl> <chr>  <chr>    
 1 HR1     0    q1     social    
 2 HR1     2.04 q1     social    
 3 HR1     2.1  q1     social    
 4 HR1     2.20 q1     interzone 
 5 HR1     2.44 q1     nest      
 6 HR1     2.47 q1     nest      
 7 HR1     2.68 q1     nest      
 8 HR1     2.71 q1     nest      
 9 HR1     2.87 q1     nest      
10 HR1     3.20 q1     nest      

I am trying to find a way to calculate how much time an animal spent in each zone. I would effectively like to generate another column where I get cumulative time for each of the visits to the zone. It would include taking max and minimum time point in the zone, subtracting them, and placing the end result into a new column. The end result should look something like this:

   animal  time period  zone   cum_time_zone
   <chr>  <dbl> <chr>  <chr>    <dbl>
 1 HR1     0    q1     social    NA
 2 HR1     2.04 q1     social    NA
 3 HR1     2.1  q1     social    2.2
 4 HR1     2.20 q1     interzone 0.24
 5 HR1     2.44 q1     nest      NA
 6 HR1     2.47 q1     nest      NA
 7 HR1     2.68 q1     nest      NA
 8 HR1     2.71 q1     nest      NA
 9 HR1     2.87 q1     nest      NA 
10 HR1     3.20 q1     nest      0.76

I will be very grateful for any input!

  • if interzone lasts 0.1, why does nest last 0.76? – Waldi Jun 08 '21 at 14:43
  • part of your question (the cumulative one) goes back to this question: https://stackoverflow.com/questions/18925600/r-cumulative-sum-by-condition. You can maybe read it and try it out for your case, then come back to clarify what is yet unclear – NicolasH2 Jun 08 '21 at 15:12
  • 1
    Hey Waldi, thanks for your comment - you are absolutely right, it shouldn't be 0.1 - I now corrected it. – Alja Podgornik Jun 08 '21 at 15:19

2 Answers2

2

With data.table:

library(data.table)

setDT(data)

data[,cum_time_zone := c(0,diff(time))][
     ,cum_time_zone:=fifelse(time==max(time),sum(cum_time_zone),NA_real_),by=rleid(animal,zone)]
data

    animal time period      zone cum_time_zone
 1:    HR1 0.00     q1    social            NA
 2:    HR1 2.04     q1    social            NA
 3:    HR1 2.10     q1    social           2.1
 4:    HR1 2.20     q1 interzone           0.1
 5:    HR1 2.44     q1      nest            NA
 6:    HR1 2.47     q1      nest            NA
 7:    HR1 2.68     q1      nest            NA
 8:    HR1 2.71     q1      nest            NA
 9:    HR1 2.87     q1      nest            NA
10:    HR1 3.20     q1      nest           1.0

Data:

data <- read.table(text="
      animal  time period zone   

1 HR1     0    q1     social    
2 HR1     2.04 q1     social    
3 HR1     2.1  q1     social    
4 HR1     2.20 q1     interzone 
5 HR1     2.44 q1     nest      
6 HR1     2.47 q1     nest      
7 HR1     2.68 q1     nest      
8 HR1     2.71 q1     nest      
9 HR1     2.87 q1     nest      
10 HR1    3.20 q1     nest  ",header=T)
Waldi
  • 39,242
  • 6
  • 30
  • 78
0

Here is one dplyr approach -

library(dplyr)

df %>%
  group_by(animal) %>%
  mutate(cum_time_zone = time - lag(time, default = 0)) %>%
  group_by(zone, .add = TRUE) %>%
  mutate(cum_time_zone = case_when(n() == 1 ~ cum_time_zone, 
                               row_number() == n()~ max(time) - min(time))) %>%
  ungroup()

#  animal  time period zone      cum_time_zone
#   <chr>  <dbl> <chr>  <chr>             <dbl>
# 1 HR1     0    q1     social           NA    
# 2 HR1     2.04 q1     social           NA    
# 3 HR1     2.1  q1     social            2.1  
# 4 HR1     2.2  q1     interzone         0.100
# 5 HR1     2.44 q1     nest             NA    
# 6 HR1     2.47 q1     nest             NA    
# 7 HR1     2.68 q1     nest             NA    
# 8 HR1     2.71 q1     nest             NA    
# 9 HR1     2.87 q1     nest             NA    
#10 HR1     3.2  q1     nest              0.76 

data

df <- structure(list(animal = c("HR1", "HR1", "HR1", "HR1", "HR1", 
"HR1", "HR1", "HR1", "HR1", "HR1"), time = c(0, 2.04, 2.1, 2.2, 
2.44, 2.47, 2.68, 2.71, 2.87, 3.2), period = c("q1", "q1", "q1", 
"q1", "q1", "q1", "q1", "q1", "q1", "q1"), zone = c("social", 
"social", "social", "interzone", "nest", "nest", "nest", "nest", 
"nest", "nest")), class = "data.frame", row.names = c(NA, -10L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    This unfortunately doesn't seem to work for me - I get a whole column of just NAs :( – Alja Podgornik Jun 08 '21 at 15:11
  • I have added data in my post using which you can test my answer. If it doesn't work on your real data can you provide a reproducible subset of data using `dput(head(df))`. – Ronak Shah Jun 09 '21 at 01:16