I have a stream of int values arriving at a certain rate. Every 5 minutes, I'd like to compute some percentiles from the values, and start over.
The problem: I don't want to waste too much memory, so I'd like to keep only a few KBs for the values. If my buffer does not fill during 5 minutes, I can compute the percentiles perfectly. If the buffer does fill however, I'd like to start dropping some values (possibly using reservoir sampling and random eviction as suggested here - Percentiles of Live Data Capture). Unfortunately I can't find a solution that works well in both scenarios - if the buffer is not full, I don't want to evict or ignore values, and once it gets full and I start evicting I invariably introduce bias.