I am filtering data for analysis and stumbled upon a problem I can not find a solution for. I did look into the prepdat-package but it does not seem to satisfy my needs. My dataframe(df) consists of reaction times of several participants measured over 4 blocks. To filter out outliers I need to apply a (mean +/- 2.5 sd)-rule for every block of each participant.
I tried creating my own function in order to apply this rule to every subsection (for each block of every participant seperatly) of my dataframe. I created the function below so I can use it with a for loop (this loop might not be optimal in R, but that is not the main concern here):
filter <- function(subject, block){
m <- mean(df[df$subj == subject & df$block == block,3])
stdv<- sd(df[df$subj == subject & df$block == block,3])
lowerbound <- m - 2.5 * stdv
upperbound <- m + 2.5 * stdv
outliers <- which((df[(df$subj == subject & df$block == block),3] <= lowerbound |df[(df$subj == subject & df$block == block),3] >= upperbound)) #Here I retrieve the index for all the rows I need to eliminate
df <<- df[-c(outliers), ]
}
I can't get my head around this indexing. For the first block of the first subject there seems to be no problem, and the function deletes the right rows. But for the next blocks (and subjects) 'outliers' also consists of the right indexes of the subset (subject and block) I ask to "select" in the function, but when I try to eliminate the rows by it, it looks like the indexes are applied to the indexes of my whole dataframe and not on the specific subset of the subject and block I used in my function. Is there something I am missing, or not (yet) aware of to use? Or is my overall way of thinking wrong??(I am still adapting to R)
subj block rt
1 1 2 345
2 1 2 118
3 1 2 302
4 1 2 698
5 1 2 154
6 2 3 347
7 2 3 391
8 2 3 414
9 2 3 427
10 2 3 369
11 6 1 685
12 6 1 369
13 6 1 457
14 6 1 566
15 6 1 542