0

I'm having an issue with the sdTrim function, which had previously ran perfectly.

I have a dataframe (= new_data) containing the following variable names:

enter image description here

There are 8 different conditions: FA_1, HIT_1, ..., FA_4, HIT_4

I wanted to trim the reaction times and calculate a mean per participant and per condition. I used the following code:

trimmedData <- sdTrim(new_data, minRT = 150, sd = 2, pptVar = "participant", condVar = "condition", rtVar = "rt", accVar = "accuracy", perParticipant = TRUE, returnType = "mean")

This used to work fine, but suddenly my condition variable is not recognized as such anymore: instead of 8 variables, all are put into one:

enter image description here

What seems to be the issue here?

I tried different ways of including perCondition = TRUE, FALSE etc. which did not change anything.

the participant and condition variables are characters, the rt is numeric

jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and provide your data using `dput()`? – jrcalabrese Jan 18 '23 at 14:46
  • of course: this is a small section of the data frame containing information from 2 participants and for each of the 4 conditions. structure(list(participant = c(986, 986, 986, 986, 986, 986, 986, 986, 988, 988, 988, 988, 988, 988), accuracy = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), condition = c("hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "FA_3", "FA_4", "hit_4", "hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "hit_4"), rt = c(638, 286, 348, 310, 404, 301, 216, 534, 348, 276, 256, 293, 495, 438)), row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame")) – nina_stats Jan 19 '23 at 13:36

1 Answers1

0

As far as I can tell, the problem is with your data, not with your code. The example data you posted only has one row per participant/condition at most; there isn't a FA_3 or FA_4 for participant 988. If your real data doesn't have enough data for each combination of participant and conditions, then it looks like sdTrim just averages by participant.

I'm unfamiliar with reaction time data, but you might be able to accomplish what you're looking for using group_by and summarize from dplyr.

Below is an example with a larger dataset based on your example data.

library(trimr)
set.seed(123)
participant <- c(rep("1", 100), rep("2", 100), rep("3", 100))
accuracy <- sample(x = c("1", "0"), size = 300, replace = TRUE, prob = c(.9, .1))
condition <- sample(x = c("hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "FA_3", "FA_4", "hit_4", "hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "hit_4"), size = 300, replace = TRUE)
rt <- sample(x = 250:625, size = 300)
new_data <- data.frame(participant, accuracy, condition, rt)

trimmedData <- sdTrim(data = new_data, 
                      minRT = 150, 
                      sd = 2, 
                      pptVar = "participant", 
                      condVar = "condition", 
                      rtVar = "rt", 
                      accVar = "accuracy", 
                      perParticipant = TRUE, 
                      returnType = "mean")

print(trimmedData)
  participant    FA_1   hit_1  hit_3   hit_2    FA_4    FA_2  FA_3   hit_4
1           1 439.800 477.250 433.85 440.375 426.286 439.500 508.8 457.429
2           2 477.067 489.933 466.50 360.000 405.000 387.533 427.2 428.364
3           3 398.333 446.500 438.00 362.077 445.000 432.333 419.2 497.125

Update (1/23/23)

In both your original and your updated datasets, you simply don't have enough values per condition to properly use sdTrim() with both participant = TRUE and condition = TRUE (condition is automatically set to TRUE if you don't specify it).

Here is a link to the sdTrim() function on Github. Start looking at line 545, which describes what happens when you have both participant and condition set to TRUE.

Part of this function involves taking the standard deviation of the data for each combination of participant and condition. If you only have one value for each combination of participant and condition, your standard deviation value will be NA. See the below example of just using participant 988 and condition hit_4. Once your standard deviation is NA, NA's just follow after that.

You either need a larger dataset with more values for each combination of participant and condition or you need to set perParticipant and perCondition to both be FALSE. If you do the second option, you will have two NaN values because those values fall under the minRT threshold that you set. However, you can avoid that by also doing returnType = "raw".

new_data <- structure(list(participant = c("986", "986", "986", "986", "986", "986", "986", "986", "988", "988", "988", "988", "988", "988", "988", "988"), accuracy = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"), condition = c("hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "FA_3", "FA_4", "hit_4", "hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "hit_4", "FA_3", "FA_4"), rt = c(638, 286, 348, 310, 404, 301, 216, 534, 348, 276, 256, 293, 495, 438, 73, 73)), row.names = c(NA, -16L), class = "data.frame")
stDev <- 2
minRT <- 150

# get the list of participant numbers
participant <- unique(new_data$participant)

# get the list of experimental conditions
conditionList <- unique(new_data$condition)

# trim the data
trimmedData <- new_data[new_data$rt > minRT, ]

# ready the final data set
finalData <- as.data.frame(matrix(0, nrow = length(participant), ncol = length(conditionList)))

# give the columns the condition names
colnames(finalData) <- conditionList

# add the participant column
finalData <- cbind(participant, finalData)

# convert to data frame
finalData <- data.frame(finalData)

# intialise looping variable for subjects
i <- 1
j <- 2

# take apart the loop
# focus on participant 988, condition hit_4
currSub <- "988"
currCond <- "hit_4"

# get relevant data
tempData <- trimmedData[trimmedData$participant == currSub & trimmedData$condition == currCond, ]

# find the cutoff
curMean <- mean(tempData$rt)
print(curMean)
[1] 438
curSD <- sd(tempData$rt)
print(curSD) # <- here is where the NA values start
[1] NA
curCutoff <- curMean + (stDev * curSD)
    
# trim the data
curData <- tempData[tempData$rt < curCutoff, ]
    
# find the average, and add to the data frame
finalData[i, j] <- round(mean(curData$rt))
head(finalData)
> participant hit_1 FA_1 hit_2 FA_2 hit_3 FA_3 FA_4 hit_4
1         986    NA    0     0    0     0    0    0     0
2         988     0    0     0    0     0    0    0     0
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • Thank you! There seems to be something off with my data frame. Even if I add the two conditions for subject 988, the same "error" occurs. However, I noticed that when I add new_data <- as.data.frame(new_data), the conditions are suddenly separated correctly. Only now I have the issue, that the mean is not generated correctly (instead it says: NA). – nina_stats Jan 19 '23 at 16:15
  • Can you post your updated dataset? – jrcalabrese Jan 19 '23 at 16:45
  • sure, this is the updated data frame: structure(list(participant = c("986", "986", "986", "986", "986", "986", "986", "986", "988", "988", "988", "988", "988", "988", "988", "988"), accuracy = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"), condition = c("hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "FA_3", "FA_4", "hit_4", "hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "hit_4", "FA_3", "FA_4"), rt = c(638, 286, 348, 310, 404, 301, 216, 534, 348, 276, 256, 293, 495, 438, 73, 73)), row.names = c(NA, -16L), class = "data.frame") – nina_stats Jan 23 '23 at 08:31
  • I have updated my post and took part the `sdTrim()` to show where it goes wrong. You either need a larger dataset or you need to change your argument specifications within `sdTrim()`. – jrcalabrese Jan 23 '23 at 15:26
  • thank you so much! I was able to solve it now. It's still a bit strange to me as I ran this exact code without issues maybe a year ago. – nina_stats Jan 23 '23 at 21:44
  • [`trimr`](https://www.jimgrange.org/post/update-to-trimr-a-response-time-trimming-package-in-r/) underwent a major update in 2018, so it's possible that you only updated `trimr` recently. – jrcalabrese Jan 24 '23 at 16:29