Objective
I have a large dataset, df, where I have a Length, Date and Edit Column. My goal is to iterate through a large dataset and find the index, start and end times for a given condition.
Working backwards, I need to get the index or row number where Edit is False *with the condition that the previous Edit is True.This will output an 'End' and the value that is in the Length column.
The Start is generated by going backwards from the 'End' index (Edit is False) and when you come across the next (Edit is False) + 1
Length Date Edit
20 1/2/2020 1:00:00 AM False
21 1/2/2020 1:00:01 AM True
81 1/2/2020 1:00:02 AM True
81 1/2/2020 1:00:03 AM True
90 1/2/2020 1:00:04 AM False
20 1/2/2020 1:00:05 AM True
90 1/2/2020 1:00:06 AM True
81 1/2/2020 1:00:10 AM True
90 1/2/2020 1:00:15 AM False
20 1/2/2020 1:00:25 AM True
This is my desired output:
Start End Duration RowNum Length
1/2/2020 1:00:05 AM 1/2/2020 1:00:15 AM 10 8 90
1/2/2020 1:00:01 AM 1/2/2020 1:00:04 AM 3 4 90
Starting backwards, we see that the first End time is at, 1/2/2020 1:00:15 AM, because Edit is False, and its previous Edit value is True. The length is 90, and the RowNumber is 8. The Start would go backwards from 1/2/2020 1:00:15 AM until we come to another Edit is False line plus 1 , so it would be: 1/2/2020 1:00:05 AM
dput
structure(list(Length = c(20L, 21L, 81L, 81L, 90L, 20L, 90L,
81L, 90L, 20L), Date = structure(1:10, .Label = c("1/2/2020 1:00:00 AM",
"1/2/2020 1:00:01 AM", "1/2/2020 1:00:02 AM", "1/2/2020 1:00:03 AM",
"1/2/2020 1:00:04 AM", "1/2/2020 1:00:05 AM", "1/2/2020 1:00:06 AM",
"1/2/2020 1:00:10 AM", "1/2/2020 1:00:15 AM", "1/2/2020 1:00:25 AM"
), class = "factor"), Edit = c(FALSE, TRUE, TRUE, TRUE, FALSE,
TRUE, TRUE, TRUE, FALSE, TRUE)), class = "data.frame", row.names = c(NA,
-10L))
This is what I have tried
library(dplyr)
library(readr)
for (i in 1:nrow(df) {
if (df[i] == Edit == "False") {
print(df[rows])
}
else if (df[i] < condition) {
print(df[rows])
}
}
mutate(Date = as.POSIXct(Date, format = '%m/%d/%Y %H:%M:%OS')) %>%
mutate(RowNum = cumsum(!cond)) %>%
group_by(Length) %>%
summarize(Start = min(Date),
End = max(Date),
Duration = End - Start) %>%
I have a start, I am just unsure of how to put this together. Any help or suggestions are appreciated.