I am trying to avoid for loop and use apply
instead for post-processing flags that I have detected.
I have a time series with a column showing whether the quality was ok or not. Here's how the dataframe looks like:
n <- 100
tstart <- strptime("12/15/16 16:00:00", "%m/%d/%y %H:%M:%S")
df <- data.frame(Date = tstart + seq(0,n*5-1,5) + sample(seq(0,3,1), n, replace = T),
Check = sample(c("FLAG", "PASS"), n, replace = T))
# head of df
# Date Check
# 1 2016-12-15 16:00:02 FLAG
# 2 2016-12-15 16:00:05 PASS
# 3 2016-12-15 16:00:13 FLAG
# 4 2016-12-15 16:00:17 PASS
# 5 2016-12-15 16:00:22 FLAG
# 6 2016-12-15 16:00:26 FLAG
I don't like to pick up all the FLAG
s though. I want to apply three conditions:
1) Disregard flags where the time difference from previous row is more than 60 seconds
2) I'd like to keep flags that have been repeating for a while.
Here's how I am implementing this:
df$Time_Difference <- c(0,as.numeric(diff(df$Date)))
df$Flag_Counter <- 0
desired_rep <- 3
# Start the clock!
ptm <- proc.time()
for (row_index in 2:nrow(df)){
if (df[row_index, "Time_Difference"] > 60){
df[row_index, "Flag_Counter"] <- 0
}
else {
if (df[row_index, "Check"] == "PASS"){
df[row_index, "Flag_Counter"] <- max(0, df[row_index-1, "Flag_Counter"] - 1)
}
else {
df[row_index, "Flag_Counter"] <- min(desired_rep, df[row_index-1, "Flag_Counter"] + 1)
}
}
}
# Stop the clock
x <- proc.time() - ptm
print(x[3])
So, really the for loop is getting the flags that have been repeating for desired_rep
times in a row. In case we have a PASS
after two FLAG
s, 1 is Flag_Counter
and finally we do df[, df$Flag_Counter == 3]
we can the post-processed flags. Now, this is extremely slow. I was wondering if we can use apply
to make this task faster. I have done this in Python
but I don't know how to access previous rows in my pre-defined function and then use apply
. I appreciate your help.