3

I would like to compute a variant of rolling medians on my dataset that does build the subsets not by going k observerations to the front and back, but by taking all observations into account that are in a given time window.

A straightforward implemtation could look like this:

windowwidth <- 30
median.window <- function(x) median(mydata[time <= x + windowwidth /2 & time >= x - windowwidth /2)
vapply(time, median.window)

However, as you can imagine, this is not very efficient for large datasets. Do you see a possible improvement or a package providing an optimized implementation? You can not expect the observations be distributed equally over time.

zoo provides rollmedian, but this function does not offer to choose the winwod based on time but on the observation count.

Thilo
  • 8,827
  • 2
  • 35
  • 56
  • If you add a toy dataset, that would help raising interest. See also [this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Joris Meys Dec 13 '11 at 14:28
  • Since the "timestamp" for each observation is essentially, from your description, random, there is no a priori way to determine which observations fit into a given window. That said, I wonder whether using `outer()` with an appropriate time-width sort of function might at least build you a complete set of windowed sample sets. I'll have to go off and play with that. – Carl Witthoft Dec 13 '11 at 14:47

1 Answers1

2

Ok, try this:

Rgames: timeseq<-1:5 
Rgames: winmat <- outer(timeseq,timeseq,FUN=function(x,y) y>=x &y<=x+2) 
Rgames: winmat 
      [,1]  [,2]  [,3]  [,4]  [,5] 
[1,]  TRUE  TRUE  TRUE FALSE FALSE 
[2,] FALSE  TRUE  TRUE  TRUE FALSE 
[3,] FALSE FALSE  TRUE  TRUE  TRUE 
[4,] FALSE FALSE FALSE  TRUE  TRUE 
[5,] FALSE FALSE FALSE FALSE  TRUE 
Rgames: winmat %*% timeseq 
     [,1] 
[1,]    6 
[2,]    9 
[3,]   12 
[4,]    9 
[5,]    5 

Replace that function with your window width and I think you'll be all set.
Edit: In respons to Thilo's query, it looks like in the general case you should use apply. Given the stuff above, call your observation values "timval", as

Rgames: timval<-c(3,4,2,6,1)
Rgames: valmat<-timval*t(winmat)
Rgames: valmat
     [,1] [,2] [,3] [,4] [,5]
[1,]    3    0    0    0    0
[2,]    4    4    0    0    0
[3,]    2    2    2    0    0
[4,]    0    6    6    6    0
[5,]    0    0    1    1    1
Rgames: apply(valmat,2,median)
[1] 2 2 1 0 0

Edit again: clearly I was asleep there: nobody wants a median based on all those zeroes. I should think more before posting. Add this:

valmat[valmat==0]<- NA
apply(valmat,2, median, na.rm=T)
[1] 3.0 4.0 2.0 3.5 1.0

And I'm sure there's a cleaner way of 'building' valmat than this, but the final result is the "filter matrix" you want to apply any function to.

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • +1 -- Very nice. I always appreciate the elegance of `outer`-based solutions! (BTW, hope you don't mind my edit to your answer. I only did it b/c I knew you could change it back if you do.) – Josh O'Brien Dec 13 '11 at 16:58
  • Hmph- whatever you edited is not immediately obvious to the unaided eye :-), so I can hardly complain. – Carl Witthoft Dec 13 '11 at 18:08
  • If you're ever interested in looking at edits, you can see them by clicking on the 'edited X hour/day ago' link above the editor's name (here Josh O'Brien). Cheers. – Josh O'Brien Dec 13 '11 at 18:11
  • Thanks Carl. However, how can I get the median based on this solution? I see how to compute rolling means, but for medians my first thought is that I would still be required to use one of the apply functions, now with precomputed filters. Did you have another idea in mind? – Thilo Dec 13 '11 at 18:18