R: Calculate SD in each row over a selection of columns after removing minimum and maximum

Question

I would like to compute the standard deviation for each row in a data frame over a selection of columns after removing the minimum and the maximum in that selection. Here is an example:

set.seed(1)
dat <- data.frame(matrix(sample(c(1:100), 10, replace=TRUE), ncol=5))

I managed to calculate the sd of my columns of interest (1:4) for each row:

dat <- transform(dat, sd = apply(dat[,1:4], 1, sd))
show(dat)

  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 33.95463
2 38 91 90 67  7 24.93324

However, I can't figure out how to exclude min(dat[1,1:4]) and max(dat[1,1:4]) before calculating sd(). The result should be this:

  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 21.92031     # notice: sd calculated by hand using 'sd(c(27,58))'
2 38 91 90 67  7 16.26346     # notice: sd calculated by hand using 'sd(c(67,90))'

Can someone help me with this?

what do you expect to be the result for `c(2, 2, 3, 4, 20, 20)` ? Please edit your question! — jogo, Jan 06 '16 at 16:21
@jogo I have a lot of decimals in my data, so your case where there are identical values will not become a problem. — piptoma, Jan 08 '16 at 09:41

score 5 · Accepted Answer · answered Jan 06 '16 at 16:19

You could write a custom function to do this for you. It takes in a vector, removes the minimum and maximum, and returns the sd of the remaining values. Of course you could also write this as an anonymous function, but sometimes having the function separate makes the code more readable.

sd_custom <- function(x){
  x <- x[x!=min(x) & x!=max(x)]
  return(sd(x))
}

dat$sd <- apply(dat[,1:4], 1, sd_custom)

> dat
  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 21.92031
2 38 91 90 67  7 16.26346

score 3 · Answer 2 · answered Jan 06 '16 at 16:18

3

You could try this:

 dat$sd <- apply(dat[1:4], 1, function(x) sd(x[-c(which.min(x), which.max(x))] ))
 dat
  X1 X2 X3 X4 X5       sd
1 27 58 21 95 63 21.92031
2 38 91 90 67  7 16.26346

answered Jan 06 '16 at 16:18

DatamineR

10,428
3
25
45

score 3 · Answer 3 · answered Jan 06 '16 at 16:19

3

We can modify your code by changing sd(x) into a custom function

dat <- transform(dat, sd = apply(dat[,1:4], 1, function(x) sd(x[x<max(x) & x>min(x)])))

answered Jan 06 '16 at 16:19

C_Z_

7,427
5
44
81

score 2 · Answer 4 · answered Jan 06 '16 at 17:00

2

Or another option is range with setdiff

dat$sd <- apply(dat[1:4], 1, function(x) sd(setdiff(x,range(x))))
dat$sd
#[1] 21.92031 16.26346

answered Jan 06 '16 at 17:00

akrun

874,273
37
540
662

R: Calculate SD in each row over a selection of columns after removing minimum and maximum

4 Answers4

Linked