I am doing data analysis of photos of animals from trail cameras. My data includes what camera a picture was taken with, the date and time the picture was taken, and the animal in the photo. I wish to aggregate my data based on the time animals spent in front of the camera. For our purposes, an encounter is anytime we photograph an animal more than 10 minutes after photographing another of the same species. Encounters can be more than 10 minutes long in some cases, such as if we took 3 pictures of the same animal 7 minutes apart from one another, a 21 minute encounter. I want my output to aggregate my data into individual encounters for all animals photographed, and include start times and end times for each encounter photo series.
My code thus far
library(dplyr)
#Data
df <- structure(list(camera_id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L), date = c("11-May-21", "11-May-21", "11-May-21",
"15-May-21", "15-May-21", "10-May-21", "10-May-21", "12-May-21",
"12-May-21", "12-May-21", "12-May-21", "12-May-21", "13-May-21",
"13-May-21"), time = c("5:23:46", "5:23:50", "5:32:34", "9:35:20",
"9:35:35", "23:11:16", "23:11:17", "11:06:08", "11:15:09", "11:24:10",
"2:04:01", "2:04:03", "1:15:00", "1:15:50"), organism = c("mouse",
"mouse", "bird", "squirrel", "squirrel", "mouse", "mouse", "woodchuck",
"woodchuck", "woodchuck", "mouse", "mouse", "mouse", "mouse")), class = "data.frame", row.names = c(NA,
-14L))
#Combining date and time
df$datetime <- as.POSIXct(paste(df$date, df$time), format ="%d-%B-%y %H:%M:%S")
#Time differences in minutes, based on organism
df <- df %>% group_by(organism) %>%
mutate(timediff = (datetime - lag(datetime))/60
)
#Round minutes to 2 decimal points
df$timediff <- round(df$timediff, digits=2)
#Make negative and NA values = 0. Negative values appear when going from one camera to the next. R thinks it is going back in time, rather than
#swapping cameras
df$timediff[df$timediff<0] <- 0
df$timediff[is.na(df$timediff)] <- 0
At this point, I want to use timediff as my condition for aggregation, and aggregate any subsequent rows of data with a timediff < 10, as long as the row has the same camera_id and organism. I've been trying different dplyr approaches but havent been able to crack this. The output should look like this.
structure(list(camera_id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L), start_datetime = c("5/11/2021 5:23",
"5/11/2021 5:32", "5/15/2021 9:35", "5/10/2021 23:11", "5/10/2021 11:06",
"5/12/2021 2:04", "5/13/2021 1:15"), end_datetime = c("5/11/2021 5:23",
"5/11/2021 5:32", "5/15/2021 9:35", "5/10/2021 23:11", "5/10/2021 11:24",
"5/12/2021 2:04", "5/13/2021 1:15"), organism = c("mouse", "bird",
"squirrel", "mouse", "woodchuck", "mouse", "mouse"), encounter_time = c("0:00:04",
"0:00:00", "0:00:15", "0:00:01", "0:18:00", "0:00:02", "0:00:50"
)), class = "data.frame", row.names = c(NA, -7L))