1

I am trying to figure out an online algorithm for "time aware" exponential moving average, sampled at varying times. By "time aware" I mean something like "giving more weight to data sampled at similar time of day", but (a) I'll give a more precise definition and (b) this is only an example for something more general that interests me.

I'll start by defining "time aware" by giving a precise example that assumes that data is sampled in constant intervals during the day; say, every 1 hour. In that case, I keep 24 different EMAs, and whenever data is sampled, I put it into the relevant EMA, taking its result and putting it in a general EMA of the results. So, at 12:00, Tuesday, I get the result of the EMA of the EMA results for 12:00, 11:00, 10:00, etc. where the EMA result for 12:00 is the EMA of some typical period of x days of data sampled at 12:00, etc.

This is an online algorithm which works well and provides reasonable results for the case where the data is sampled in constant time intervals. Without that assumption, its results become meaningless, or perhaps it is not even well defined.


The more general case can be described so: at a given moment I have a set of samples, each is a tuple (x,v) where x is some sample invariant (can be thought of as the sampling "location") and v is the sampling "value", and I would like to find out the (weighted) average at some "location" y, where the weights have negative correlation to the distances of y from x. This generalizes the previous problem by letting x be the pair (t,d) where t is the sampling time and d is the time-of-day (hour, in our case), and by defining some metric on the set of all such tuples which will describe well our needs. A reasonable demand would be to decide that if d is constant, the weight function on the distances will be similar to that of exponentially moving average (perhaps a continuous version of it).

The main problem is finding an efficient online algorithm that does the work in the general case, or define a specific metric which allows such an efficient online algorithm, or show that in almost any interesting case it is impossible.

Bach
  • 6,145
  • 7
  • 36
  • 61
  • See http://www.eckner.com/papers/unevenly_spaced_time_series_analysis.pdf Section 8.2 and more relevant links in answers to the question I marked as possible duplicate. – Lior Kogan Dec 24 '13 at 18:30
  • possible duplicate of [Exponential Moving Average Sampled at Varying Times](http://stackoverflow.com/questions/1023860/exponential-moving-average-sampled-at-varying-times) – Lior Kogan Dec 24 '13 at 18:32
  • @LiorKogan: I have read that question, of coruse. It has similarities, but it is surely not a duplicate. – Bach Dec 25 '13 at 07:17
  • @LiorKogan: the article you've sent is indeed interesting, and the relevant section indeed treats moving averages of data sampled. However, it is not answering my question at all. I was looking for something that gives custom importance to sample according to some weight/distance function. – Bach Dec 25 '13 at 07:52

1 Answers1

1

EMA is essentially weighted average. When you combine several weighted averages with some weights you get a new weighted average with weights equal to products. This is exactly what you got with "time aware" EMA.

Of course, you can generalize it widely by assigning (almost arbitrary) weight as a function of "t".

As for online algorithm, you apparently want to add new points with very little efforts. EMA works nicely in this respect because EMA(x_1,...,x_n+1) = a*EMA(x_1,..., x_n) + (1-a)*x_n. You can find a lot of similar formulas for cases where weights have some symmetries or recursions (aka "group property"). Most likely, your recursive formula will have more summands in this case.

Michael Simbirsky
  • 3,045
  • 1
  • 12
  • 24
  • Thanks for your answer. There is still a slight difference between a combination of weighted averages an my problem; in my problem, the weights of the samples varies with regard to where you check; that is, at 12:00, the weights of samples from around 12:00 will be higher than those of 13:00, but one hour later it will be the opposite. – Bach Dec 27 '13 at 08:01
  • Well, I take it back. The wide generalization you've proposed catches that as well. So it is only left to figure out whether I can find something that catches the essence of my problem, but that has some symmetries which will allow efficient online algorithm. – Bach Dec 27 '13 at 08:05