0

I have found that someone else has a similar problem here. But I am very new to programming and to R and I don't understand how I can adapt the answers to my situation. My data looks like what the following code can generate:

df1 = data.frame(ACC1 = sample((1:0), 16, replace = TRUE), RT1 = sample((1000:2000), 16, replace = TRUE))
df2 = data.frame(ACC2 = sample((1:0), 16, replace = TRUE), RT2 = sample((1000:2000), 16, replace = TRUE))
cbind(df1,df2)

Basically I have a number of accuracy variables (ACC) coupled with reaction time (RT) ones. ACC1 corresponds to RT1 and so on. Each row is a single participant. In my real data each participant has done hundreds of trials but in this mock data you can see 2 trials. What I look for is an efficient solution to dynamically pick out the reaction time but only for the trials where the participants gives a correct response (i.e. ACC = 1) and then calculate the mean reaction time of only correct trials. Hope my question is clear and thank you very much in advance for your help.

Community
  • 1
  • 1
TVV
  • 69
  • 6

1 Answers1

1

If I understood correctly, you have a dataframe with columns ACC1,RT1,... ACC100,RT100, and you would like to compute the mean of RTx only when ACCx is 1 (and so for each participant = row).

If you want to go over rows, a good way to do so, it to use the apply function in R.

input<-cbind(df1, ... df100)
subset<-grep(pattern = "ACC", x = colnames(input)) ###which are the ACC columns?
result<-apply(X = input,
    MARGIN = 1 ### tells apply to go row by row not column by column
    FUN = function(z){ # an anonymous function
     sub<-which(z[subset]==1) ##Returns x if ACCx is 1
     return(mean(z[2*sub])) ## If x is 1, we want to select column 2*x based on your example
    }

Edit: for non-ordered columns/ with other columns in-between:

input<-cbind(df1, ... df100)
input<-input[,order(colnames(input))] ###make sure you have ACC1 ... ACC100, ... ,RT1, ... RT100
C<-colnames(input)
subset<-grep(pattern = "ACC", x = C) ###which are the ACC columns?
result<-apply(X = input,
    MARGIN = 1 ### tells apply to go row by row not column by column
    FUN = function(z){ # an anonymous function
     sub<-which(z[subset]==1) ##Returns x if ACCx is 1
     RTnames<-paste("RT",sub,sep=1)
     return(mean(z[C %in% RTnames])) 
    }
DeveauP
  • 1,217
  • 11
  • 21
  • Thank you very much. It works on my mock data! However, I run into another minor problem when I try this on my real data. That is not every ACC variable is followed by its corresponding RT variable. The columns are ACC1, ACC2,ACC3... ACC100,RT1, RT2,RT3, ..., RT100. So I modify the last line of your code to `return(mean(z[sub+100]))` But R returns incorrect results. Maybe it's a very stupid question or maybe I don't understand how *apply* works, maybe I just dont understand your code? – TVV Mar 31 '16 at 20:22
  • The way you wrote it should work. Do you have other columns in your data frame? What is your control to know that the function is not working properly? – DeveauP Apr 01 '16 at 07:12
  • You are right. I did have other columns (that are not named with ACC or RT) in my data frame. Now I took only the subset with only ACC and RT variables then it works correctly! But I still don't understand, the object *subset* should only contains the columns whose names are ACC something and whose values are 1, right? Then why would the presence of other columns interfere with the calculation? (Btw, I use Excel to check if the means are correct.) – TVV Apr 01 '16 at 10:04
  • Maybe some columns are in-between. Grep returns the index of the element matching your search. I'm editing and putting a very safe way to do it. – DeveauP Apr 01 '16 at 11:15
  • There are nothing in between the ACC and RT columns though. They belong to one block. Other columns either precede or follow this block. Thank you so much for your help. I really appreciate it. – TVV Apr 01 '16 at 12:13
  • Hi DeveauP, I tried your new code but it doesn't work on my data. I don't understand what *[C %in% RTnames]* does there. The results are just NaNs. My header of my data is as follows: Subject number, Condition, ACC1, ACC2, ACC3, ... ACC100, RT1, RT2, RT3, ... RT100. That's all. – TVV Apr 07 '16 at 19:42
  • Hi, C is the column names and RTnames are the names of the columns for which you want to compute the mean. C %in% RTnames gives a logical vector and for each position of C, is the element in the vector RTnames – DeveauP Apr 08 '16 at 11:33