1

I have a very long set of data collected from animal transmitters. Due to variable recharge of the tranmsitter's solar batteries, the interval between data points is highly variable (ranging from 180 seconds up to over one hour). I want to subset the data so the interval between points is a minimum of 10 minutes, or 600 seconds.

Here is what a small subset of my data looks like:

    datetime             id
    01/09/2015 14:10:54  A
    01/09/2015 14:26:56  A
    01/09/2015 14:41:28  A
    01/09/2015 14:43:53  A
    01/09/2015 14:46:37  A
    01/09/2015 14:48:57  A
    01/09/2015 14:51:31  A
    01/09/2015 14:54:08  A
    04/09/2015 14:37:07  B
    04/09/2015 14:52:07  B
    04/09/2015 15:07:04  B
    04/09/2015 15:15:35  B
    04/09/2015 15:18:00  B
    04/09/2015 15:20:23  B
    04/09/2015 15:22:49  B
    04/09/2015 15:25:12  B 
    04/09/2015 15:28:52  B

My desired output with a minimum interval of 10 minutes would be:

    datetime             id
    01/09/2015 14:10:54  A
    01/09/2015 14:26:56  A
    01/09/2015 14:41:28  A
    01/09/2015 14:51:31  A
    01/09/2015 14:37:07  B
    04/09/2015 14:52:07  B
    04/09/2015 15:07:04  B
    04/09/2015 15:18:00  B 
    04/09/2015 15:28:52  B

I found an almost exact question with an answer here. Their data included id, date and time. Here is the code given in the answer:

    library(dplyr)
    library(lubridate)

    locdata %>% 
       mutate(timestamp = dmy_hm(paste(date, time))) %>%
       group_by(id, date) %>%
       mutate(delta = timestamp - first(timestamp),
       steps = as.numeric(floor(delta / 3600)), 
       change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps))) %>%
          filter(change > 0) %>%
          select(id, date, timestamp)

I adapted this slightly to my data as below:

    result <- mydata %>%
                group_by(id) %>%                          
                mutate(delta = datetime - first(datetime),        
                steps = as.numeric(floor(delta / 600)),
                change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps)))

The code results in this output:

    datetime             id    delta   steps   change
    01/09/2015 14:10:54  A     0        0      1
    01/09/2015 14:26:56  A     962      1      1
    01/09/2015 14:41:28  A     1834     3      2
    01/09/2015 14:51:31  A     2437     4      1
    04/09/2015 14:37:07  B     0        0      1
    04/09/2015 14:52:07  B     900      1      1
    04/09/2015 15:07:04  B     1797     2      1
    04/09/2015 15:15:35  B     2308     3      1
    04/09/2015 15:18:00  B     2453     4      1
    04/09/2015 15:22:29  B     3105     5      1

The output gives the first data point in each 10 minute time block starting at time zero (per id). This is not exactly what I need, as some of the time points are less than 10 mins apart. What I need is the next time that is 10 mins or more after the previous point within each id.

Any idea how I could do this? Would I need to use a loop? Thanks for any ideas.

KazRiri
  • 13
  • 2
  • Can you add an example of your data frame? `dput(locdata[1:30, ])` will do. – Istrel Oct 21 '19 at 07:55
  • Thanks for the response. Here is the 17 rows of data i included originally; "","datetime","id" "1",2015-09-01 14:10:54,"A" "2",2015-09-01 14:26:56,"A" "3",2015-09-01 14:41:28,"A" "4",2015-09-01 14:43:53,"A" "5",2015-09-01 14:46:37,"A" "6",2015-09-01 14:48:57,"A" "7",2015-09-01 14:51:31,"A" "8",2015-09-01 14:54:08,"A" "9",2015-09-04 14:37:07,"B" "10",2015-09-04 14:52:07,"B" "11",2015-09-04 15:07:04,"B" "12",2015-09-04 15:15:35,"B" "13",2015-09-04 15:18:00,"B" "14",2015-09-04 15:20:23,"B" "15",2015-09-04 15:22:49,"B" "16",2015-09-04 15:25:12,"B" "17",2015-09-04 15:28:52,"B" – KazRiri Oct 22 '19 at 04:47
  • @Istrel is this what you meant? There are only 17 rows here but I can add more if need be. Thanks again, and see my comments on the response below. – KazRiri Oct 22 '19 at 04:54

1 Answers1

0

I had this idea using dplyr (with lead() function) and a while loop, it mights help you

library(dplyr)
library(lubridate)

data <- data %>%
  mutate(date = lubridate::ymd_hms(datetime),
         id_rows = 1:nrow(.)) %>%
  group_by(id) %>%
  mutate(delta = lubridate::time_length(lag(date) %--% date, unit = "sec"))

while (min(data$delta, na.rm = T) < 600) {
  rm_rows <- data %>%
    filter(delta < 600) %>%
    filter(date == min(date)) %>%
    pull(id_rows)
  data <- data %>%
    filter(!id_rows %in% rm_rows) %>%
    mutate(delta = lubridate::time_length(lag(date) %--% date, unit = "sec"))
}
Jrmie
  • 26
  • 4
  • Thanks so much for the response. This code is great in that it is returning delta as the difference between each successive row within id. However it is deleting all rows that have a delta value below 600, which results in too many rows being deleted. For example when I run the code on my test data set of 17 rows above, it returns only 6 rows when it should return 9. I think I need a loop that deletes the next row if delta is below 600, then starts again. – KazRiri Oct 22 '19 at 03:33
  • To make the above point more clear, instead of calculating delta between each row all in one go and then deleting all those with a delta below 600, I need it to calculate delta between rows 1 and 2, delete row 2 if delta is below 600, then calculate delta between rows 1 and 3. – KazRiri Oct 22 '19 at 04:27
  • I edited the answer, I used `lead()` but here you want to use `lag()`. Now it returns 9 rows – Jrmie Oct 22 '19 at 16:17