I am having a problem in my data set due to a data issue which is coming from the application, usually the dataset will have one unique row with a start time and an end time. Each row in the dataset corresponds to a an ID which always have 11 'checks' with a corresponding start time and an end time. But due to an issue we got two start times and end time which ruined the data set.
What I am envisioning to do is basically using dplyr and use the mutate function to check if the end time for a check for an ID is greater than the start time of the next check (ideally this should never happen but this happened so I want my code to check for these nuances and fix it).
The fix what I am looking for is that the end time for a specific check of an ID to be always smaller than the start time for the next check).
# My data looks like this:
df <- data.frame("ID" = c(A,A,A,A,A,A,A,A,A,A,A),
"Check"= c(1:11),
"Start_time" = c("2022-09-01 00:00:00.0","2022-09-01 00:00:00.0","2022-09-01 00:00:00.0","2022-09-14 13:03:00.0","2022-09-14 13:04:32.531","2022-09-14 13:04:35.0","2022-09-14 17:04:32.965","2022-09-19 22:05:10.137","2022-09-20 22:09:39.456","2022-09-22 18:44:06.992",
"2022-09-22 19:40:31.852","2022-09-22 15:41:02.0"),
"end_time" = c("2022-09-01 00:00:00.0","2022-09-01 00:00:00.0","2022-09-19 18:03:44.0","2022-09-19 18:05:08.0","2022-09-19 18:05:01.694","2022-09-19 18:05:06.0","2022-09-19 22:05:10.137","2022-09-20 22:09:39.456","2022-09-21 12:40:24.32","2022-09-22 18:44:06.992",
"2022-09-22 19:40:31.852","2022-09-23 12:05:07.0")
)
What I am trying myself to do this:
df %>%
group_by(ID,Check) %>%
mutate(end_time = ifelse(end_time>Start_time,Start_time,end_time)
But this approach with DPLYR is not clearly working and I don't see the end times being updated, maybe it has do with each ID having checks 1-5 so R doesn't know maybe for which check it should do this.
solution#2 which seems to be working but I don't think its an efficient way to do it:
df$end_time[df$check==1] <- as_datetime(ifelse(df$end_time[df$check==1] > df$start_time[df$check==2],df$start_time[df$check==2],df$end_time[df$check==1]))
df$end_time[df$check==2] <- as_datetime(ifelse(df$end_time[df$check==2]>df$start_time[df$check==3],df$start_time[df$check==3],df$end_time[df$check==2]))
df$end_time[df$check==3] <- as_datetime(ifelse(df$end_time[df$check==3]>df$start_time[df$check==4],df$start_time[df$check==4],df$end_time[df$check==3]))
df$end_time[df$check==4] <- as_datetime(ifelse(df$end_time[df$check==4]>df$start_time[df$check==5],df$start_time[df$check==5],df$end_time[df$check==4]))
df$end_time[df$check==5] <- as_datetime(ifelse(df$end_time[df$check==5]>df$start_time[df$check==6],df$start_time[df$check==6],df$end_time[df$check==5]))
df$end_time[df$check==6] <- as_datetime(ifelse(df$end_time[df$check==6]>df$start_time[df$check==7],df$start_time[df$check==7],df$end_time[df$check==6]))
df$end_time[df$check==7] <- as_datetime(ifelse(df$end_time[df$check==7]>df$start_time[df$check==8],df$start_time[df$check==8],df$end_time[df$check==7]))
df$end_time[df$check==8] <- as_datetime(ifelse(df$end_time[df$check==8]>df$start_time[df$check==9],df$start_time[df$check==9],df$end_time[df$check==8]))
df$end_time[df$check==9] <- as_datetime(ifelse(df$end_time[df$check==9]>df$start_time[df$check==10],df$start_time[df$check==10],df$end_time[df$check==9]))
df$end_time[df$check==10] <- as_datetime(ifelse(df$end_time[df$check==10]>df$start_time[df$check==11],df$start_time[df$check==11],df$end_time[df$check==10]))
df$end_time[df$check==11] <- as_datetime(ifelse(df$end_time[df$check==11]>df$start_time[df$check==12],df$start_time[df$check==12],df$end_time[df$check==11]))