0

Hope someone can send help for a desperate student :-) I have a set of procedure codes for which I have a different number of surgeries (here: procedures) with their respective durations. I would like to get some descriptive statistics on the durations. For that, I would like my loop to already detect and remove the outliers by IQR function. This is the code without outlier detection and removal:

# variables for output - run before each loop
Counter0<-1
Procedure_codes<-NULL
Number<-NULL
Min_Times<-NULL
Max_Times<-NULL
Average_Times<-NULL
Median_Times<-NULL
SD_Times<-NULL

#loop over all procedure codes
while(Counter0<=number_of_different_procedurecodes) {
  a_g_procedures2<-NULL
  Procedure_Name<-eval(list_of_procedurecodes[Counter0])
  Procedure_name<-unlist(Procedure_Name)
  print(Procedure_Name)
  a_g_procedures2$Duration<-NULL
  Durations<-NULL
  number_of_procedures<-0
  #Subset data for the specific procedure
  a_g_procedures2<-subset(a_g_procedures1,ProcedureCode==Procedure_Name)
  number_of_procedures<-length(a_g_procedures2$ProcedureCode)
  Counter1<-1

  #loop over specific procedure
  while(Counter1<=number_of_procedures){
   a_g_procedures$Duration<-NULL
    TimeIn_1_Selected<-a_g_procedures2$"TimeIn_1"[Counter1]
    TimeIn_1_Selected<-as.POSIXct(TimeIn_1_Selected,format="%d/%m/%Y %H:%M")
    TimeIn_1_S<-as.numeric(TimeIn_1_Selected)
    
    TimeIn_2_Selected<-a_g_procedures2$"TimeIn_2"[Counter1]
    TimeIn_2_Selected<-as.POSIXct(TimeIn_2_Selected,format="%d/%m/%Y %H:%M")
    TimeIn_2_S<-as.numeric(TimeIn_2_Selected)
    
    TimeOut_Selected<-a_g_procedures2$"TimeOut"[Counter1]
    TimeOut_Selected<-as.POSIXct(TimeOut_Selected,format="%d/%m/%Y %H:%M")
    
    
    if (TimeIn_1_S>TimeIn_2_S) {
      
      Start_Time<-TimeIn_2_Selected
    }
    if (TimeIn_1_S<=TimeIn_2_S) {
      Start_Time<-TimeIn_1_Selected
    }
    print (Start_Time)
    print(TimeOut_Selected)
    
    Duration<-difftime(TimeOut_Selected, Start_Time, units = "mins")
    Durations<-c(Durations,Duration)

    Counter1<-Counter1+1
  }
  
  Procedure_codes<-c(Procedure_codes,Procedure_name)
  Durations<-as.numeric(Durations)
  Mean_Time<-mean(Durations, digits=1)
  SD_Time<-sd(Durations,na.rm=TRUE)
  Min_Time<-min(Durations, na.rm=TRUE)
  Max_Time<-max(Durations, na.rm=TRUE)
  Median_Time<-median(Durations, na.rm=TRUE)
  Average_Times<-c(Average_Times,Mean_Time)
  SD_Times<-c(SD_Times,SD_Time)
  Min_Times<-c(Min_Times, Min_Time)
  Max_Times<-c(Max_Times, Max_Time)
  Median_Times<-c(Median_Times, Median_Time)
  Number<-c(Number,number_of_procedures)
  Counter0<-Counter0+1  
}

ag_output<-data.frame(Procedure_codes,Number,Min_Times, Max_Times, Average_Times, Median_Times, SD_Times)

This is what I would have liked to add to the loop over specific procedure:

Q<-quantile(Duration, probs=c(.25,.75), na.rm=FALSE)
iqr<-IQR(Duration)
up<-Q[2]+1.5*iqr
low<-Q[1]-1.5*iqr
remove<-Duration>(Q[1]-1.5*iqr) & Durations<(Q[1]-1.5*iqr)
setdiff(Duration, remove)

Does somebody have an idea how I could do this?

Thank you very much in advance!

Kaya
  • 115
  • 6
  • Hello :) please consider trying to make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) this will greatly increase your chances to receive an answer. Then, loops in R are not a recommended practice, have you tried `apply()`, `lapply()`, etc.? In this case, I would create custom functions (or find existing ones in others R packages) that I would then apply to my dataset. Note that going into this `apply` mindset might help you make your example reproducible. – Paul Jun 22 '20 at 11:24
  • Also, [this](https://stackoverflow.com/questions/4787332/how-to-remove-outliers-from-a-dataset) might contains some ideas. – Paul Jun 22 '20 at 11:26
  • Thanks, Paul, will do! – Kaya Jun 22 '20 at 11:48

1 Answers1

0

make it a function?

f.remove_outliers_IQR <- function(Duration)
{
Q <- quantile(Duration, probs=c(.25,.75), na.rm=FALSE)
iqr <- IQR(Duration)
up <- Q[2]+1.5*iqr
low <- Q[1]-1.5*iqr
remove <- Duration>(Q[1]-1.5*iqr) & Durations<(Q[1]-1.5*iqr)
Duration_out <- setdiff(Duration, remove)
return(Duration_out)
}

and call it in the main loop, maybe just before Counter1<-Counter1+1?

efz
  • 425
  • 4
  • 9
  • Do I need to change anything in the code of the main loop after Counter1<-Counter1+1 then? Because Min_Times, Mean_Time etc. are calculated from Durations? – Kaya Jun 22 '20 at 10:42
  • I understood your code calculates the `Duration` and that you want to remove outling values of `Duration` before qualifying your procedures. if that is correct, than yes, you should run the IQR test on `Duration` after the line `Duration<-difftime(TimeOut_Selected, Start_Time, units = "mins")`. At least that is how I interpret your code – efz Jun 22 '20 at 10:51
  • Yes, thanks. It unfortunately does not work for some reason. Would I need to change ```Min_Time<-min(Durations, na.rm=TRUE)```to ```Min_Time<-min(Duration_out, na.rm=TRUE)```? Sorry, I have actually never used functions before... I also just tried ```Duration[!Duration %in% boxplot.stats(Duration)$out]```but the resulting values are still including outliers. – Kaya Jun 22 '20 at 11:44
  • Sorry, just recognised: Well, I am using ```Durations<-c(Durations, Duration)```, so do I need to change this to ```Durations<-c(Durations, Duration_out)```? – Kaya Jun 22 '20 at 11:47
  • yes, either that or you call `Duration <- f.remove_outliers_IQR(Duration)`. It is always difficult to give suggestions without a repruducible example. in your case I assume that your code is checked and that the outlier detection method works as you would expect. For example, the ruturn of the function could be `NA`? pay attention that `setdiff(Duration, remove)` is not the same as `setdiff(remove, Duration)` (see https://www.rdocumentation.org/packages/prob/versions/1.0-1/topics/setdiff) – efz Jun 22 '20 at 12:28