0

I have wildlife camera trap data. Often one animal will trigger a camera repeatedly if it remains in it's frame for a long period of time. I would like to identify when this occurs.

If there are consecutive events (rows) with less than 5 minutes between (within a date), I assume it is one animal. I would like to choose one row and discard the rest. I would also like to group by site. Here is an example of my data and the desired outcome.

Current data:

tibble::tribble(
  ~date, ~time, ~site,
    "24/08/2019", "14:44",  "A",
    "24/08/2019", "14:45",  "A",
    "24/08/2019", "14:46",  "A",
    "24/08/2019", "14:50",  "A",
    "24/08/2019", "14:47",  "B",
    "24/08/2019", "14:48",  "B",
    "24/08/2019", "17:14",  "B",
    "24/08/2019", "17:18",  "B",
    "24/08/2019", "20:04",  "B",
    "25/08/2019", "14:42",  "A"
  )
date       time   site           
24/08/2019 14:44  A                        
24/08/2019 14:45  A                        
24/08/2019 14:46  A
24/08/2019 14:50  A           
24/08/2019 14:47  B                        
24/08/2019 14:48  B
24/08/2019 17:14  B
24/08/2019 17:18  B
24/08/2019 20:04  B
25/08/2019 14:42  A

Desired outcome:

date       time   site           
24/08/2019 14:44  A                                      
24/08/2019 14:47  B                        
24/08/2019 17:14  B
24/08/2019 20:04  B
25/08/2019 14:42  A

Thank you in advance!

Eric
  • 2,699
  • 5
  • 17
kalex
  • 41
  • 5
  • Thanks very much pointing this out and allowing me to clarify! I would like to consider the observations you described as in the same group. – kalex Mar 25 '21 at 16:34
  • You may check this canonical post for the `cumsum` / `diff(x) ` idiom to create a grouping variable according to differences between rows / consecutive events: [How to split a vector into groups of consecutive sequences?](https://stackoverflow.com/questions/5222061/how-to-split-a-vector-into-groups-of-consecutive-sequences). You don't have to `split`, just play with the `cumsum(...diff`. Once you have a created the "sequence grouping variable", you can use it together with site to "do something" for each group, e.g. select first row. – Henrik Mar 25 '21 at 16:50

1 Answers1

1

Using the data shown reproducibly in the Note at the end sort the data by site and datetime and append a diff column showing the difference in time between successive rows in the same site giving DFs and from that we can derive a membership column which assigns a unique number to each set of rows which are near each other by using cumsum(diff >= 5). We then choose the first row in each group.

library(dplyr)

DFs <- DF %>%
  arrange(site, datetime) %>%
  group_by(site) %>%
  mutate(diff = c(Inf, as.numeric(diff(datetime), units = "mins"))) %>%
  ungroup 

DFs %>%
  group_by(membership = cumsum(diff >=5)) %>%
  slice(1) %>%
  ungroup
## # A tibble: 5 x 6
##   date       time  site  datetime             diff membership
##   <chr>      <chr> <chr> <dttm>              <dbl>      <int>
## 1 24/08/2019 14:44 A     2019-08-24 14:44:00   Inf          1
## 2 25/08/2019 14:42 A     2019-08-25 14:42:00  1432          2
## 3 24/08/2019 14:47 B     2019-08-24 14:47:00   Inf          3
## 4 24/08/2019 17:14 B     2019-08-24 17:14:00   146          4
## 5 24/08/2019 20:04 B     2019-08-24 20:04:00   166          5

Another approach is to create an igraph g (see diagram at end) with one vertex per row having an edge between successive rows that are less than 5 apart. The connected components of that graph can be used to form membership and then we proceed as above.

library(igraph)

nr <- nrow(DFs)
g <- make_empty_graph(n = nr)
wx <- which(DFs$diff < 5)
g <- add_edges(g, c(rbind(wx - 1, wx)))
plot(g) # see plot at end

DFs$membership <- components(g)$membership

DFs %>%
  group_by(membership) %>%
  slice(1) %>%
  ungroup

screenshot

Note

Lines <- "
date       time   site           
24/08/2019 14:44  A                        
24/08/2019 14:45  A                        
24/08/2019 14:46  A
24/08/2019 14:50  A           
24/08/2019 14:47  B                        
24/08/2019 14:48  B
24/08/2019 17:14  B
24/08/2019 17:18  B
24/08/2019 20:04  B
25/08/2019 14:42  A"
DF <- read.table(text = Lines, header = TRUE)
DF$datetime <- as.POSIXct(paste(DF$date, DF$time), format = "%d/%m/%Y %H:%M")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you kindly! I'm afraid it did not work.. the membership variable is not assigning events to the same membership based on our rules. For example, 2019-10-11 15:05:00 and 2019-10-11 15:07:00 are getting assigned to two separate memberships. Hmm. – kalex Mar 25 '21 at 20:35
  • Since it works in the answer you will need to show the data in reproducible form that leads to problems. – G. Grothendieck Mar 25 '21 at 22:20
  • One thing to check is that `units(diff(DFs$datetime))` gives `mins` and not some other unit. If it does give some other unit then presumably your actual data varies from what was shown but you can force it to be in minutes by replacing the relevant line with `mutate(diff = c(Inf, diff(as.numeric(datetime)/60))) %>%` where `datetime` is of POSIXct class as in the answer.` – G. Grothendieck Mar 25 '21 at 23:47
  • That works! Thank you very much for your assistance. – kalex Mar 26 '21 at 20:54
  • Have modified code to force units to be in minutes. – G. Grothendieck Mar 28 '21 at 12:43