1

I am having a problem in my data set due to a data issue which is coming from the application, usually the dataset will have one unique row with a start time and an end time. Each row in the dataset corresponds to a an ID which always have 11 'checks' with a corresponding start time and an end time. But due to an issue we got two start times and end time which ruined the data set.

What I am envisioning to do is basically using dplyr and use the mutate function to check if the end time for a check for an ID is greater than the start time of the next check (ideally this should never happen but this happened so I want my code to check for these nuances and fix it).

The fix what I am looking for is that the end time for a specific check of an ID to be always smaller than the start time for the next check).

# My data looks like this:
df <- data.frame("ID" = c(A,A,A,A,A,A,A,A,A,A,A),
                 "Check"= c(1:11),
                 "Start_time" = c("2022-09-01 00:00:00.0","2022-09-01 00:00:00.0","2022-09-01 00:00:00.0","2022-09-14 13:03:00.0","2022-09-14 13:04:32.531","2022-09-14 13:04:35.0","2022-09-14 17:04:32.965","2022-09-19 22:05:10.137","2022-09-20 22:09:39.456","2022-09-22 18:44:06.992",
    "2022-09-22 19:40:31.852","2022-09-22 15:41:02.0"),
                 "end_time" = c("2022-09-01 00:00:00.0","2022-09-01 00:00:00.0","2022-09-19 18:03:44.0","2022-09-19 18:05:08.0","2022-09-19 18:05:01.694","2022-09-19 18:05:06.0","2022-09-19 22:05:10.137","2022-09-20 22:09:39.456","2022-09-21 12:40:24.32","2022-09-22 18:44:06.992",
    "2022-09-22 19:40:31.852","2022-09-23 12:05:07.0")
                 )

What I am trying myself to do this:

df %>%
group_by(ID,Check) %>% 
mutate(end_time = ifelse(end_time>Start_time,Start_time,end_time)

But this approach with DPLYR is not clearly working and I don't see the end times being updated, maybe it has do with each ID having checks 1-5 so R doesn't know maybe for which check it should do this.

solution#2 which seems to be working but I don't think its an efficient way to do it:

df$end_time[df$check==1] <- as_datetime(ifelse(df$end_time[df$check==1] > df$start_time[df$check==2],df$start_time[df$check==2],df$end_time[df$check==1]))
df$end_time[df$check==2] <- as_datetime(ifelse(df$end_time[df$check==2]>df$start_time[df$check==3],df$start_time[df$check==3],df$end_time[df$check==2]))
df$end_time[df$check==3] <- as_datetime(ifelse(df$end_time[df$check==3]>df$start_time[df$check==4],df$start_time[df$check==4],df$end_time[df$check==3]))
df$end_time[df$check==4] <- as_datetime(ifelse(df$end_time[df$check==4]>df$start_time[df$check==5],df$start_time[df$check==5],df$end_time[df$check==4]))
df$end_time[df$check==5] <- as_datetime(ifelse(df$end_time[df$check==5]>df$start_time[df$check==6],df$start_time[df$check==6],df$end_time[df$check==5]))
df$end_time[df$check==6] <- as_datetime(ifelse(df$end_time[df$check==6]>df$start_time[df$check==7],df$start_time[df$check==7],df$end_time[df$check==6]))
df$end_time[df$check==7] <- as_datetime(ifelse(df$end_time[df$check==7]>df$start_time[df$check==8],df$start_time[df$check==8],df$end_time[df$check==7]))
df$end_time[df$check==8] <- as_datetime(ifelse(df$end_time[df$check==8]>df$start_time[df$check==9],df$start_time[df$check==9],df$end_time[df$check==8]))
df$end_time[df$check==9] <- as_datetime(ifelse(df$end_time[df$check==9]>df$start_time[df$check==10],df$start_time[df$check==10],df$end_time[df$check==9]))
df$end_time[df$check==10] <- as_datetime(ifelse(df$end_time[df$check==10]>df$start_time[df$check==11],df$start_time[df$check==11],df$end_time[df$check==10]))
df$end_time[df$check==11] <- as_datetime(ifelse(df$end_time[df$check==11]>df$start_time[df$check==12],df$start_time[df$check==12],df$end_time[df$check==11]))

1 Answers1

0

In the data you have posted, the Start_time and end_time columns are character vectors. So I had to mutate() them. You can skip that if your they are POSIXct in your data.

The second issue is that ifelse() can never return an object of class POSIXct. But you can use dplyr::if_else (or data.table::fifelse()) which can:

df |>
    mutate(
        across(Start_time:end_time, as.POSIXct)
    ) |>
    group_by(ID,Check) |> 
    mutate(
        end_time = if_else(
            end_time>Start_time,
            Start_time,
            end_time
        )
    )

# # A tibble: 12 x 4
# # Groups:   ID, Check [12]
#    ID    Check Start_time          end_time
#    <chr> <int> <dttm>              <dttm>
#  1 A         1 2022-09-01 00:00:00 2022-09-01 00:00:00
#  2 A         2 2022-09-01 00:00:00 2022-09-01 00:00:00
#  3 A         3 2022-09-01 00:00:00 2022-09-01 00:00:00
#  4 A         4 2022-09-14 13:03:00 2022-09-14 13:03:00
#  5 A         5 2022-09-14 13:04:32 2022-09-14 13:04:32
#  6 A         6 2022-09-14 13:04:35 2022-09-14 13:04:35
#  7 A         7 2022-09-14 17:04:32 2022-09-14 17:04:32
#  8 A         8 2022-09-19 22:05:10 2022-09-19 22:05:10
#  9 A         9 2022-09-20 22:09:39 2022-09-20 22:09:39
# 10 A        10 2022-09-22 18:44:06 2022-09-22 18:44:06
# 11 A        11 2022-09-22 19:40:31 2022-09-22 19:40:31
# 12 A        12 2022-09-22 15:41:02 2022-09-22 15:41:02

Input data:

df  <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A"), Check = 1:12, Start_time = c("2022-09-01 00:00:00.0",
"2022-09-01 00:00:00.0", "2022-09-01 00:00:00.0", "2022-09-14 13:03:00.0",
"2022-09-14 13:04:32.531", "2022-09-14 13:04:35.0", "2022-09-14 17:04:32.965",
"2022-09-19 22:05:10.137", "2022-09-20 22:09:39.456", "2022-09-22 18:44:06.992",
"2022-09-22 19:40:31.852", "2022-09-22 15:41:02.0"), end_time = c("2022-09-01 00:00:00.0",
"2022-09-01 00:00:00.0", "2022-09-19 18:03:44.0", "2022-09-19 18:05:08.0",
"2022-09-19 18:05:01.694", "2022-09-19 18:05:06.0", "2022-09-19 22:05:10.137",
"2022-09-20 22:09:39.456", "2022-09-21 12:40:24.32", "2022-09-22 18:44:06.992",
"2022-09-22 19:40:31.852", "2022-09-23 12:05:07.0")), class = "data.frame", row.names = c(NA,
-12L))
SamR
  • 8,826
  • 3
  • 11
  • 33
  • Thanks for your help, i am trying both the as_datetime before ifelse and also dplyr::ifelse but still the data is not successfully being changed – Bilal Jamil Oct 12 '22 at 15:32