1

I have a monotonically growing sequence of integers. For example

seq=[(0, 0), (1, 5), (10, 20), (15, 24)].

And a integer value greater than the largest argument in the sequence (a > seq[-1][0]). I want to estimate value corresponding to the given value. The sequence grows nearly linearly, and earlier values are less important than later. Nevertheless I can't simply take 2 last points and calculate new value, because mistakes are very likely and the curve may change the angle.

Can anyone suggest a simple solution for this kind of task in Python?

Levon
  • 138,105
  • 33
  • 200
  • 191
Fedor
  • 1,392
  • 1
  • 17
  • 30
  • 2
    Use the most recent five points. Or 10. Or however many recent points you want. – robert May 26 '12 at 18:38
  • 1
    related http://stackoverflow.com/a/488941/4279 – jfs May 26 '12 at 18:45
  • what you mean "curve may change angle"? Gradually or suddenly? If gradually, then simply take last 10 (as suggested in another comment), calculate the 10 angles, through away two outliers and take average of the rest. Another one, a common polygon smoothing technique is, take 10 last segments, replace each with its middle point, repeat for the resulting 9 segments, 8, ... . – Will Ness May 26 '12 at 19:06

2 Answers2

3

There are many issues. Extrapolation is a nasty thing to start with. Do you assume a linear extrapolant? Polynomial models (beyond linear) extrapolate terribly poorly in general. Or should you assume some sort of extrapolant that is asymptotic to a line? What matters is what you are willing to assume, and what information you can bring to the modeling process.

If you can assume a linear extrapolant, then I might do a weighted least squares fit, with a straight line model with decreasing weights as you move away from the endpoint. (In fact, no matter what model you end up posing, a weighted least squares estimation seems logical, with the weights a function of position.)

Thus, suppose you choose to pose a nonlinear model that is something like

y = a + bx + c*exp(-d*x)

This model will asymptotically approach a straight line, with slope b, as x gets large. You might still use a weighted model that discounts those points away from the end you are interested in.

Again, long distance extrapolation is a difficult thing to attempt. Remember the words of Mark Twain...

“In the space of one hundred and seventy six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over a mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian Period, just a million years ago next November, the Lower Mississippi was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-pole. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo [Illinois] and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.” "Life on the Mississippi", Mark Twain, 1884

0

If the sequence does not have a lot of noise, just use the latest point, and the point for 1/3 of the current, then estimate your line from that. Otherwise do something more complicated like a least squares fit for the latter half of the sequence.

If you search on Google, there are a number of code samples for doing the latter, and some modules that may help. (I'm not a Python programmer so I can't give a meaningful recommend for the best one.)

btilly
  • 43,296
  • 3
  • 59
  • 88