0

I am new to Numpy and I am vectorizing a homemade function that uses numpy's cumsum() for its calculation. However I found the result has some rounding error whose culprit seems to be the cumsum(). It can be shown as follows:

    import numpy as np

X = np.repeat(30000.1,114569)
X1 = X**2
Y = np.cumsum(X1)

Z =Y[1:] - Y[:-1]
print(f'Theoretical value: 900006000.01')
print(f'first member of X1: {X1[0]}')
print(f'first member of Z: {Z[0]}')
print(f'last member of Z: {Z[-1]}')
print('They are mathematically the same, but in reality they are different')

The result is:

Theoretical value: 900006000.01
first member of X1: 900006000.0099999
first member of Z: 900006000.0099999
last member of Z: 900006000.015625
They are mathematically the same, but in reality they are different

Here are my questions:

1) Are there anyways to improve the precision of cumsum?

2a) If there are, can you show me a simple example?

2b) If not, then what's the known maximum size of a float64 or maximimum length of argument vector before cumsum run into rounding errors?

3) Are there any package in python to handle calculations with high precision floating point numbers

Thank you in advance

EDIT: changed the number to fewer decimal places to emphasize the issue here. I mean the problem of rounding error appears as early as 2 decimal places, I think it is a really big problem.

EDIT2: some people pointed out that the subtraction between big numbers also contributes to the error. My another question is that is there anyway in python to handle the numerical errors that stem from the subtraction between two big numbers?

mathguy
  • 1,450
  • 1
  • 16
  • 33
  • Floating point will pretty much always have rounding error. You can't fix it by throwing more precision at the problem. A rounding error of one part in a nonillion isn't going to be more useful to you than a rounding error of one part in a quadrillion. – user2357112 May 22 '19 at 02:46
  • In this particular case `cumsum` is not to blame. What you are seeing is a (mild) case of loss of significance as you are subtracting two similar size large numbers. Indeed, floating point numbers of oom `X[0]**2*size` have a resolution of `~0.016`. – Paul Panzer May 22 '19 at 02:51
  • @user2357112 Edited the post to better emphasize the problem here. the rounding error appears as early as 2 decimal places, which is a serious problem. – mathguy May 22 '19 at 03:00
  • @PaulPanzer But there are quite a few operations(I can't name them just yet, but I bet they exist) requiring subtractions between big numbers in python, how did people solve these issues then? There must be some way to handle it. I just have no idea how and that's why I created this post. – mathguy May 22 '19 at 03:03
  • Restructure your computation in a more numerically stable form. – user2357112 May 22 '19 at 04:35
  • I was using np.cumsum to calculate the vectorized mean sqaured deviance of 100k+ data about a few millions times, so performance is really important. Without np.cumsum the operation will take 100 times longer than it should. I haven't found a numerically stable way to vectorize the function just yet. Maybe I should create another post for it. – mathguy May 23 '19 at 03:43

0 Answers0