1

Another basic question from an R newbie. I have a dataset: testMeanSD. Here is some relevant data, using dput() - my first time trying this for ouput, so I hope I have done it correctly:

testMeanSD <- structure(list(RT = c(1245L, 1677L, 1730L, 1066L, 994L), mean = c(1143.77777777778, 
1143.77777777778, 1143.77777777778, 1143.77777777778, 1143.77777777778
), sd = c(202.255299928596, 202.255299928596, 202.255299928596, 
202.255299928596, 202.255299928596), RT2 = c(1245L, 1677L, 1730L, 
1066L, 994L)), .Names = c("RT", "mean", "sd", "RT2"), row.names = c(NA, 
5L), class = "data.frame")

RT2 is just a duplicate of RT for me to modify. For each row, I need to alter the value of RT2 if it meets certain conditions. Otherwise RT2 stays the same as RT (or as the current value in RT2, which is the same thing). Here are the conditions:

  1. find all values in RT2 that exceed the Mean + 2.5 * SD and trim them to be equal to the Mean + 2.5 * SD

    if (RT2 > Mean + (2.5 * SD)) RT2 = Mean + 2.5 * SD

  2. find all values that are less than the Mean - 2.5 times the SD and trim them to be equal to the Mean - 2.5 * SD

    else if (RT2 < Mean - (2.5 * SD)) RT2 = Mean - 2.5 * SD

  3. leave everything else as is

    else
    RT2 = RT

I thought this would be fairly basic in R, but I simply can't find a way to make it work. Here are some of my attempts (all failed):

First:

testMeanSD$RT2 = testMeanSD$RT
if (testMeanSD$RT2 > (testMeanSD$mean + (2.5 * testMeanSD$sd))) {
    testMeanSD$RT2 = (testMeanSD$mean + (2.5 * testMeanSD$sd))
}
else if(testMeanSD$RT2 < (testMeanSD$mean - (2.5 * testMeanSD$sd))) {
    testMeanSD$RT2 = (testMeanSD$mean - (2.5 * testMeanSD$sd))
}
else {
    testMeanSD$RT2 = testMeanSD$RT
}

Second:

ifelse(testMeanSD$RT2 > (testMeanSD$mean + (2.5 * testMeanSD$SD)), testMeanSD$RT2 <- (testMeanSD$mean + (2.5 * testMeanSD$sd)),
    ifelse(testMeanSD$RT2 < (testMeanSD$Mean - (2.5 * testMeanSD$sd)), testMeanSD$RT2 <- (testMeanSD$mean - (2.5 * testMeanSD$sd)), testMeanSD$RT2 <- testMeanSD$RT)

Third:

testMeanSD$RT2 <- ifelse(testMeanSD$RT2 > (testMeanSD$mean + (2.5 * testMeanSD$sd)), testMeanSD$mean + (2.5 * testMeanSD$sd)),
   ifelse(testMeanSD$RT2 < (testMeanSD$mean - (2.5 * testMeanSD$SD)), (testMeanSD$mean - (2.5 * testMeanSD$sd)), testMeanSD$RT2 <- testMeanSD$RT)

I looked through some related posts, and this one seems closest: Loop over rows of dataframe applying function with if-statement

But it's not clear for me how to incorporate if then into the approaches outlined there (if not as I have them above).

Any help would be greatly appreciated. Thanks!

Community
  • 1
  • 1
D T
  • 99
  • 1
  • 12
  • 3
    Welcome to Stack Overflow! You will find that you get better answers if you take the time to make your question reproducible. Please follow the guidelines (http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), paying special attention to the part about `dput()`. Thanks! – Ari B. Friedman Aug 20 '12 at 11:33
  • Hi Ari, thank you for pointing out my posting error - I want to do stuff right here. I'm looking over how to use dput() now. My dataset is huge, and I will obviously need to make it smaller or make some fake data, and I'm new to R, so this may take some time. Thanks - DT – D T Aug 20 '12 at 11:46
  • OK -done. I think I did it right. Thanks for pointing me in the right direction. – D T Aug 20 '12 at 12:02
  • That looks great. Thanks for adding the reproducible example! – Ari B. Friedman Aug 20 '12 at 12:54

1 Answers1

4

You almost certainly want to avoid loops and if statements here in favor of vectorized conditionals and assignment.

Let's take your first example if (RT2 > Mean + (2.5 * SD)) RT2 = Mean + 2.5 * SD, assuming your data.frame is called dat:

sel <- dat$RT2>dat$mean + 2.5*dat$SD # creates a boolean of length nrow(dat)
dat$RT2[sel] <- with(dat[sel,], mean + 2.5*SD)

You can use with() to save a lot of typing of "dat$".

N.B. I haven't tested this since there's no reproducible dataset. There's almost certainly a typo somewhere!

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • 2
    You can also use `with` to set `sel`: `sel <- with(dat, RT2 > mean + 2.5*sd)`. – seancarmody Aug 20 '12 at 11:46
  • OK, if I understand correctly, you mean that I am just supposed to apply the first two conditions independently and skip the third. Seems to work. Thanks for your help! – D T Aug 20 '12 at 12:44
  • @user1603288 Precisely. This is one of those areas where it really pays to write R-like code rather than using `for` and `if`. – Ari B. Friedman Aug 20 '12 at 12:54