I have a large data frame consisting of camera trap observations from camera traps placed at different locations every month. One observation consists of five photos triggered by one animal. Excerpt of the dataframe
dput
of the first 20 rows:
>structure(list(deploymentid = structure(c(2L, 2L, 2L, 2L, 2L,
>2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("B4-Wintergatter_Riedlhäng",
"I3-Wintergatter_Riedlhäng"), class = "factor"), species = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Rotwild", class = "factor"), time = structure(c(1520900972,
1520900972, 1520900972, 1520900972, 1520900972, 1520900982, 1520900982,
1520900982, 1520900982, 1520900982, 1520901025, 1520901025, 1520901025,
1520901025, 1520901025, 1520975705, 1520975705, 1520975705, 1520975705,
1520975705), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("deploymentid",
"species", "time"), row.names = c(NA, 20L), class = "data.frame")
For analysis, I have determined a period of 2 min between consecutive observations to be considered independent. To achieve this, I computed the time difference between two consecutive photos for each camera deployment. Following that, I selected all times with a difference larger than two minutes. I then subsetted the data frame to only contain photos taken at those selected times:
1) First I used dplyr to compute the time interval to the previous photo of the same deployment. For the first observation of each deployment I randomly choose 1000 as a number bigger than 120, so those are included in my selection later.
library(dplyr)
deerobs_tbl<-tbl_df(Deerobs)
deerobs_gr<-group_by(deerobs_tbl,deploymentid)
deerobs_or<-arrange(deerobs_gr$time,.by_group = T)
deerobs_2<-mutate(deerobs_or,diff=c(1000,diff(time)))
deerobs2_df<-data.frame(deerobs_2)
2) I guess this would have also been possible with dplyr, but plyr was easier to use. I built a dataframe only with columns for the deployment ID, the time and the difference in the time to the previous picture. Then I selected for each deployment the times, that were more than 2 min apart and selected all rows with those times.
library (plyr)
deerobs_times<-data.frame(deerobs2_df$time,deerobs2_df$deploymentid,deerobs2_df$diff)
deerobs_times_apart<-ddply(deerobs_times,"deerobs2_df.deploymentid",subset,deerobs2_df.diff>120)
deerobs_t<-deerobs_times_apart[,1]
Deerobs_subset<-subset(deerobs2_df,deerobs2_df$time%in%deerobs_t)
The only problem is that this removes far more observations than would be necessary. The number of photos is reduced from more than 9000 to less than 3000. For example, if ten observations follow each other with an interval of 1.5 minutes, all the photos are removed, although five are more than two minutes apart from each other. Is there any possibility to circumvent this problem and select all of the observations which are more than two minutes apart?