2

I would like to compute the standard deviation for each row in a data frame over a selection of columns after removing the minimum and the maximum in that selection. Here is an example:

set.seed(1)
dat <- data.frame(matrix(sample(c(1:100), 10, replace=TRUE), ncol=5))

I managed to calculate the sd of my columns of interest (1:4) for each row:

dat <- transform(dat, sd = apply(dat[,1:4], 1, sd))
show(dat)

  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 33.95463
2 38 91 90 67  7 24.93324

However, I can't figure out how to exclude min(dat[1,1:4]) and max(dat[1,1:4]) before calculating sd(). The result should be this:

  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 21.92031     # notice: sd calculated by hand using 'sd(c(27,58))'
2 38 91 90 67  7 16.26346     # notice: sd calculated by hand using 'sd(c(67,90))'

Can someone help me with this?

piptoma
  • 754
  • 1
  • 8
  • 19
  • 1
    what do you expect to be the result for `c(2, 2, 3, 4, 20, 20)` ? Please edit your question! – jogo Jan 06 '16 at 16:21
  • @jogo I have a lot of decimals in my data, so your case where there are identical values will not become a problem. – piptoma Jan 08 '16 at 09:41

4 Answers4

5

You could write a custom function to do this for you. It takes in a vector, removes the minimum and maximum, and returns the sd of the remaining values. Of course you could also write this as an anonymous function, but sometimes having the function separate makes the code more readable.

sd_custom <- function(x){
  x <- x[x!=min(x) & x!=max(x)]
  return(sd(x))
}

dat$sd <- apply(dat[,1:4], 1, sd_custom)

> dat
  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 21.92031
2 38 91 90 67  7 16.26346
Heroka
  • 12,889
  • 1
  • 28
  • 38
3

You could try this:

 dat$sd <- apply(dat[1:4], 1, function(x) sd(x[-c(which.min(x), which.max(x))] ))
 dat
  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 21.92031
2 38 91 90 67  7 16.26346
DatamineR
  • 10,428
  • 3
  • 25
  • 45
3

We can modify your code by changing sd(x) into a custom function

dat <- transform(dat, sd = apply(dat[,1:4], 1, function(x) sd(x[x<max(x) & x>min(x)])))
C_Z_
  • 7,427
  • 5
  • 44
  • 81
2

Or another option is range with setdiff

dat$sd <- apply(dat[1:4], 1, function(x) sd(setdiff(x,range(x))))
dat$sd
#[1] 21.92031 16.26346
akrun
  • 874,273
  • 37
  • 540
  • 662