1

Is there a java library that allows me to update rather than recompute quantiles of a large sample set of data with addition/removal of data points ? My guess is that an efficient algorithms should take a constant time for the update (not a function of the number of points already existing).

Known algorithms are listed but dont' have a way of removing points from the sample set :

Here is a sample problem: Say I want to compute say an arbitrary but constant percentile fan speed of a set of windmills (as an estimate of the wind speed). The fan speeds are updated asynchronously every few milliseconds. This library should allow me to update the wind speeds of one windmill at a time without having to recalculate the median.

fodon
  • 4,565
  • 12
  • 44
  • 58

2 Answers2

2

If you maintain an updatable sorted representation of the data, getting the quantiles is easy and efficient just by using the length of your array. For example, if you have N elements then the median will be at position N/2, and so on. When you insert a new element into your data structure, this will still hold. The efficiency is then just dependent on inserting a new element.

Bitwise
  • 7,577
  • 6
  • 33
  • 50
  • yes, this is theoretically easy, but I'm fining it hairy. So, is there a library or something that does this ... want to avoid writing code and testing. – fodon Oct 17 '12 at 17:05
1

You could have multiple batches of data. You can combine the percentiles/quartiles of these batches to estimate an aggregate. The benefit is that you can discard a number of batches efficiently without having to re-calculate the other batches.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • The batch idea would work for stats on the same object, but this is stats on a collection of objects ... added an example to the question. – fodon Oct 17 '12 at 16:57
  • Is it that you want to add/remove one at a time? You can do this by keeping a ring buffer and a count of the number of samples. To remove decrement the count for the removed value and to add increment the count of the added value. – Peter Lawrey Oct 17 '12 at 17:00
  • yes, that would do, but the percentile would have to be computed each time no? – fodon Oct 17 '12 at 17:02
  • There is probably a way to reduce how often you need to re-calculate it if the quartiles don't change very often. – Peter Lawrey Oct 17 '12 at 17:05