1

Say I have this small data set in R.

Data = c(8,2,9,7,8,8,9,8,4,9,9,7,2,5,2,2,1,9,9,7)

So I am looking to get the average of the 2'nd 3'rd and 4'th smallest and largest values from the last 10 where all values the 10 and ranks should be variable.

In Excel the formula for the mins would be =AVERAGE(SMALL(A11:A20,ROW(INDIRECT("2:4")))) Entered with Ctrl+Shift+Enter.

And the expected result is

val min    max
8   NA     NA
2   NA      NA
9   NA      NA
7   NA      NA
8   NA      NA
8   NA      NA
9   NA      NA
8   NA      NA
4   NA      NA
9   6.3333  8.6667
9   6.3333  9
7   7.3333  9
2   6       8.6667
5   5.3333  8.6667
2   3.6667  8.6667
2   2.6667  8.6666
1   2       8
9   2       8.3333
9   2       9
7   2       8.3333

In r this seems fairly simple, I get the 10 or x values sort the values and then average or do whatever operation on the appropriate 3 values, which is discussed here

The problem is that I can only do this in a loop which is slow. Ho can I get such complicated operations in a vectored function?

Developer Guy
  • 2,318
  • 6
  • 19
  • 37
MichaelE
  • 763
  • 8
  • 22

1 Answers1

4

Don't know how much of a speed improvement is, but rollapply() from zoo is an option.

library(zoo)

mmavg <- function(x) {
    len <- length(x)
    sorted <- sort(x)
    c(min=mean(sorted[2:4]), max=mean(sorted[(len-3):(len-1)]))
}

cbind(val=Data, rollapplyr(Data, 10, mmavg, fill=NA))

#  val      min      max
#    8       NA       NA
#    2       NA       NA
#    9       NA       NA
#    7       NA       NA
#    8       NA       NA
#    8       NA       NA
#    9       NA       NA
#    8       NA       NA
#    4       NA       NA
#    9 6.333333 8.666667
#    9 6.333333 9.000000
#    7 7.333333 9.000000
#    2 6.000000 8.666667
#    5 5.333333 8.666667
#    2 3.666667 8.666667
#    2 2.666667 8.666667
#    1 2.000000 8.000000
#    9 2.000000 8.333333
#    9 2.000000 9.000000
#    7 2.000000 8.333333
AkselA
  • 8,153
  • 2
  • 21
  • 34
  • Thanks, that works great. Can you explain why I need rollapplyr vs rollapply. It sounds simple but I cannot find a good explanation when to use which one? Thank again. – MichaelE Apr 17 '18 at 21:14
  • 1
    @MichaelE: The 'r' just stands for 'right aligned'. rollapply(..., align="right") will work the same, the former is just a bit more compact. – AkselA Apr 17 '18 at 21:30
  • I have found an issue when applying it to larger data. Is there a reason rollapply fails for dataframes? cbind(val=as.data.frame(Data), rollapplyr(as.data.frame(Data), 10, mmavg, fill=NA)) gives me a long single vector instead of a matrix with a warning dim<-.zoo`(`*tmp*`, value = c(length(z), 1)). Any ideas or should I ask another question. – MichaelE Apr 18 '18 at 13:53
  • I got it, my data was a 1xn matrix instead or a vector so I needed to add , by.column = FALSE to rollapply and then it works because a matrix is not the same as a vector?? – MichaelE Apr 18 '18 at 14:09
  • Yes. Vectors, matrices and single column data frames are treated as separate data classes in R. It can get a bit confusing. – AkselA Apr 18 '18 at 17:58