I have a very long set of data collected from animal transmitters. Due to variable recharge of the tranmsitter's solar batteries, the interval between data points is highly variable (ranging from 180 seconds up to over one hour). I want to subset the data so the interval between points is a minimum of 10 minutes, or 600 seconds.
Here is what a small subset of my data looks like:
datetime id
01/09/2015 14:10:54 A
01/09/2015 14:26:56 A
01/09/2015 14:41:28 A
01/09/2015 14:43:53 A
01/09/2015 14:46:37 A
01/09/2015 14:48:57 A
01/09/2015 14:51:31 A
01/09/2015 14:54:08 A
04/09/2015 14:37:07 B
04/09/2015 14:52:07 B
04/09/2015 15:07:04 B
04/09/2015 15:15:35 B
04/09/2015 15:18:00 B
04/09/2015 15:20:23 B
04/09/2015 15:22:49 B
04/09/2015 15:25:12 B
04/09/2015 15:28:52 B
My desired output with a minimum interval of 10 minutes would be:
datetime id
01/09/2015 14:10:54 A
01/09/2015 14:26:56 A
01/09/2015 14:41:28 A
01/09/2015 14:51:31 A
01/09/2015 14:37:07 B
04/09/2015 14:52:07 B
04/09/2015 15:07:04 B
04/09/2015 15:18:00 B
04/09/2015 15:28:52 B
I found an almost exact question with an answer here. Their data included id, date and time. Here is the code given in the answer:
library(dplyr)
library(lubridate)
locdata %>%
mutate(timestamp = dmy_hm(paste(date, time))) %>%
group_by(id, date) %>%
mutate(delta = timestamp - first(timestamp),
steps = as.numeric(floor(delta / 3600)),
change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps))) %>%
filter(change > 0) %>%
select(id, date, timestamp)
I adapted this slightly to my data as below:
result <- mydata %>%
group_by(id) %>%
mutate(delta = datetime - first(datetime),
steps = as.numeric(floor(delta / 600)),
change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps)))
The code results in this output:
datetime id delta steps change
01/09/2015 14:10:54 A 0 0 1
01/09/2015 14:26:56 A 962 1 1
01/09/2015 14:41:28 A 1834 3 2
01/09/2015 14:51:31 A 2437 4 1
04/09/2015 14:37:07 B 0 0 1
04/09/2015 14:52:07 B 900 1 1
04/09/2015 15:07:04 B 1797 2 1
04/09/2015 15:15:35 B 2308 3 1
04/09/2015 15:18:00 B 2453 4 1
04/09/2015 15:22:29 B 3105 5 1
The output gives the first data point in each 10 minute time block starting at time zero (per id). This is not exactly what I need, as some of the time points are less than 10 mins apart. What I need is the next time that is 10 mins or more after the previous point within each id.
Any idea how I could do this? Would I need to use a loop? Thanks for any ideas.