There's an efficient way to keep track of the minimum (or maximum) value within a given time window without usually having to store all the numbers that have arrived within that window. (However, the worst case scenario still does require storing all the numbers, so you need to reserve space for all of them or accept that you may sometimes get incorrect results.)
The trick is to only store values that:
- have arrived within the time window, and
- are smaller (or larger) than any later value.
A suitable data structure for implementing this is a simple circular buffer storing values and their arrival times. You'll need to maintain two indices into the buffer. Here's a simple English description of the algorithm:
At startup:
- Allocate an N-element buffer
val
of values and a corresponding N-element buffer time
of timestamps.
- Let
imax
= 0 (or any other value between 0 and N−1 inclusive) and let inext
= imax
. This indicates that the buffer is currently empty.
When a new value new
is received at time t
:
- While
imax
≠ inext
and time[imax]
is outside the interval, increment imax
by one (modulo N).
- While
imax
≠ inext
and val[inext-1]
≥ new
, decrement inext
by one (modulo N).
- Let
val[inext]
= new
and time[inext]
= t
.
- If
inext
≠ imax-1
, increment inext
by one (modulo N); else deal with the "buffer full" condition appropriately (e.g. allocate a larger buffer, throw an exception, or just ignore it and accept that the last value was not correctly recorded).
When the minimum value is requested:
- While
imax
≠ inext
and time[imax]
is outside the interval, increment imax
by one (modulo N).
- If
imax
≠ inext
, return val[imax]
; else return an error indicating that no values have been received within the time interval.
If the values received are independent and identically distributed (and arrive as a Poisson process), I believe it can be shown that the average number of values stored in the list at any given time is ln(n+1), where n is the average number of values received within the time interval. For n = 50,000, ln(n+1) ≈ 10.82. However, one should keep in mind that this is only the average, and that several times more space may occasionally be required.
For the average, the same trick unfortunately doesn't work. If possible, you could switch to an exponentially moving average, which can be easily tracked using very little space (just one number for the average and one timestamp indicating when it was last updated).
If that's not possible, but you're willing to accept a small amount of smoothing in the average values, you could calculate an average over, say, each millisecond. That way, whenever an average of the values over the last second is requested, you can just take an average of the last 1001 millisecond averages, weighing the oldest and newest of them according to how much of those milliseconds lie within the interval:
At startup:
- Let interval be the length of the time interval to average over, and let n be the number of subintervals.
- Let dt = interval / n.
- Allocate an n+1 -element buffer
sum
of values and an n+1 -element buffer cnt
of non-negative integers, and fill both with zeros.
- Let
prev
have any value. (It doesn't really matter.)
When a new value new
is received at time t
:
- Let
i
= floor(t
/ dt) mod (n+1).
- If
i
≠ prev
:
- Subtract
sum[i]
from total
and cnt[i]
from count
.
- Let
sum[i]
= 0, cnt[i]
= 0 and let prev
= i
.
- Add
new
to sum[i]
and increment cnt[i]
by one.
- Add
new
to total
and increment count
by one.
When the average value is requested at time t
:
- Let
i
= floor(t
/ dt) mod (n+1).
- If
i
≠ prev
:
- Subtract
sum[i]
from total
and cnt[i]
from count
.
- Let
sum[i]
= 0, cnt[i]
= 0, and let prev
= i
.
- Let
j
= (i
−n) mod (n+1) = (i
+1) mod (n+1).
- Let
w
= frac(t
/ dt) = (t
/ dt) − floor(t
/ dt).
- Return (
total
− w
× sum[j]
) / (count
− w
× cnt[j]
).