0

Consider pandas.Series() with datetime index:

time_series
0
2019-02-26 01:06:06.237     12.494133
2019-02-26 01:06:10.407     10.175791
2019-02-26 01:06:14.390      6.560483
2019-02-26 01:06:18.547      3.663422
2019-02-26 01:06:23.127     12.443180
2019-02-26 01:06:34.407     12.447814
2019-02-26 01:06:38.563     12.437152
2019-02-26 01:06:42.877      8.149799
2019-02-26 01:06:46.457     12.465176
2019-02-26 01:06:50.470     12.655360
2019-02-26 01:06:54.910     12.633331
2019-02-26 01:06:58.800     12.744521
2019-02-26 01:07:02.533      6.188236
2019-02-26 01:07:22.520     12.637320
2019-02-26 01:07:26.613     12.581206
2019-02-26 01:07:30.880     12.624784
2019-02-26 01:07:35.160      6.638114
2019-02-26 02:09:42.270      0.000000
2019-02-26 02:20:17.010     94.643995
2019-02-26 02:20:17.903    105.356006
2019-02-26 02:33:33.070      7.597106
2019-02-26 02:43:12.870     75.000000
2019-02-26 03:09:24.157      2.000000
2019-02-26 03:42:46.613     18.122552
2019-02-26 03:42:47.223     31.870000
2019-02-26 03:42:47.270      0.007448
2019-02-26 03:51:51.120      1.860013
Name: 1, dtype: float64

Then apply rolling sum to this series:

time_series.rolling('3600s').sum()
0
2019-02-26 01:06:06.237    1.249413e+01
2019-02-26 01:06:10.407    2.266992e+01
2019-02-26 01:06:14.390    2.923041e+01
2019-02-26 01:06:18.547    3.289383e+01
2019-02-26 01:06:23.127    4.533701e+01
2019-02-26 01:06:34.407    5.778482e+01
2019-02-26 01:06:38.563    7.022198e+01
2019-02-26 01:06:42.877    7.837177e+01
2019-02-26 01:06:46.457    9.083695e+01
2019-02-26 01:06:50.470    1.034923e+02
2019-02-26 01:06:54.910    1.161256e+02
2019-02-26 01:06:58.800    1.288702e+02
2019-02-26 01:07:02.533    1.350584e+02
2019-02-26 01:07:22.520    1.476957e+02
2019-02-26 01:07:26.613    1.602769e+02
2019-02-26 01:07:30.880    1.729017e+02
2019-02-26 01:07:35.160    1.795398e+02
2019-02-26 02:09:42.270   -2.131628e-14
2019-02-26 02:20:17.010    9.464399e+01
2019-02-26 02:20:17.903    2.000000e+02
2019-02-26 02:33:33.070    2.075971e+02
2019-02-26 02:43:12.870    2.825971e+02
2019-02-26 03:09:24.157    2.845971e+02
2019-02-26 03:42:46.613    9.512255e+01
2019-02-26 03:42:47.223    1.269926e+02
2019-02-26 03:42:47.270    1.270000e+02
2019-02-26 03:51:51.120    5.386001e+01
Name: 1, dtype: float64

Now, there is negative value in rolling sum of non-negative values! What is it? Is it just machine accuracy issue or there is a pandas bug?

Negative value is rather small (-2.131628e-14), but I tested other time series and absolute value of negative values in rolling sum reached 5e-10 in some cases. It looks strange for the result of arithmetic operations with float64.

If we calculate rolling sum manually, we obtain 0.0 or NaN value, but I see no way to obtain small negative even if taking into account floating point math.

  • Possible duplicate of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – BoarGules Mar 12 '19 at 08:57
  • @BoarGules I see now way to apply floating point math issues to the case I described above. – Yuri Bogdanov Mar 12 '19 at 12:48
  • Your datatype is `float64`. That *specifies* floating-point calculations. `pandas` uses the same underlying floating-point hardware that Python itself does. I don't understand why you think that what you are doing should be somehow exempt from the limitations of the hardware on which you are running your calculations. – BoarGules Mar 12 '19 at 12:53
  • @BoarGules you're right it's because of floating point operations. The reason for this behavior of the window function in pandas is that the recurrence method is used to calculate it: values that are not in the new window are thrown out and new ones are added. Thus, the longer the series, the more error can accumulate. – Yuri Bogdanov Mar 12 '19 at 13:43

0 Answers0