Problem
I have a very large data frame (almost 20,000 rows) where data was documented approximately every 1-3 minutes. Unfortunately, I am unable to upload any authentic data to this post.
Aim
The aim is to reduce the size of the data frame by filtering the rows by date and time into 5-minute intervals for each month. The data spans over 4 years.
I have tried using different packages and functions in R to figure out how to do this over the last couple of days to no avail such as dplyr(), aggregate(), and tidyverse(), and I just can't solve this conundrum.
Structure of the data frame
I have a data frame like this
ID Date Time
1 9/15/16 6:48:00 AM
2 9/15/16 6:54:00 AM
3 9/15/16 6:57:00 AM
4 9/15/16 6:59:00 AM
5 9/15/16 7:03:00 AM
6 9/15/16 7:05:00 AM
I would like to convert this data into a data frame like the example below by calculating the number of minutes and seconds between each subsequent 'ID' row.
ID Date Start_Time End_Time Minutes Seconds
1 9/15/16 6:48:00 AM 6:54:00 AM 5.0 300.00
2 9/15/16 6:54:00 AM 6:57:00 AM 3.0 180.00
3 9/15/16 6:57:00 AM 6:59:00 AM 2.0 120.00
4 9/15/16 6:59:00 AM 7:03:00 AM 4.0 240.00
5 9/15/16 7:03:00 AM 7:05:00 AM 2.0 120.00
6 9/15/16 7:05:00 AM etc
Afterwards, I'd like to filter the data frame containing the new calculations between the subsequent rows of 'IDs' by date and time into 5-minute or 300.0-second time intervals per month to reduce the size of the data frame.
The output should be something like this unless someone has a more efficient method.
ID Date Start_Time End_Time Minutes Seconds
1 9/15/16 6:48:00 AM 6:54:00 AM 5.0 300.00
I appreciate your thoughts on this
Many thanks in advance.
Progress
Many many thanks in advance for this solution, it worked really well. Sorry for all these questions, I am a novice with R. Could I please query what the warning message below means (ℹ 232 failed to parse) and what went wrong with the calculations of minutes and seconds for the rows in my data frame(' New_Track' - see below)? The minutes and seconds are exactly the same in all columns called 'Minutes' and 'Seconds'. In addition, for IDs 7 and 8, the new calculations show the difference is 950520 minutes and seconds when the correct calculation is approximately 2 minutes or 120 seconds (see below).
Row 7
7 7 9/15/16 7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00 950520M 0S 950520
8 8 9/26/16 7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00 120M 0S 120
Warning Message:
##My data frame is called 'track' and the columns are: (1) ID; (2) Date; and (3) Time
##Code:
New_Track <- data.frame(
+ stringsAsFactors = FALSE,
+ ID = track$ID,
+ Date = track$Date,
+ Time = track$Time
+ ) %>%
+ mutate(Start_Time = mdy_hms(paste(Date, Time)),
+ End_Time = lead(Start_Time),
+ Minutes = minutes(End_Time-Start_Time),
+ Seconds = (End_Time-Start_Time) / dseconds(1))
Warning message:
Problem while computing `Start_Time = mdy_hms(paste(Date, Time))`.
ℹ 232 failed to parse.
New Data frame Layout - 'New_Track'
ID Date Time Start_Time End_Time Minutes Seconds
1 1 9/15/16 6:48:00 AM 2016-09-15 06:48:00 2016-09-15 06:54:00 360M 0S 360
2 2 9/15/16 6:54:00 AM 2016-09-15 06:54:00 2016-09-15 06:57:00 180M 0S 180
3 3 9/15/16 6:57:00 AM 2016-09-15 06:57:00 2016-09-15 06:59:00 120M 0S 120
4 4 9/15/16 6:59:00 AM 2016-09-15 06:59:00 2016-09-15 07:03:00 240M 0S 240
5 5 9/15/16 7:03:00 AM 2016-09-15 07:03:00 2016-09-15 07:05:00 120M 0S 120
6 6 9/15/16 7:05:00 AM 2016-09-15 07:05:00 2016-09-15 07:07:00 120M 0S 120
7 7 9/15/16 7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00 950520M 0S 950520
8 8 9/26/16 7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00 120M 0S 120
9 9 9/26/16 7:11:00 AM 2016-09-26 07:11:00 2016-09-26 07:13:00 120M 0S 120