0

I find that numpy.diff gives an inaccurate result, which can be shown by the following code snippet.

import numpy as np

# test data
arr_a = np.linspace(10**8,10**9,500,dtype=np.double)
arr_b = np.linspace(10**9,10**10,500,dtype=np.double)
arr_c = np.linspace(10**10,10**11,500,dtype=np.double)

# results
plt.figure(figsize=(15,5))

plt.plot( np.diff(arr_c,n=2), "r", linewidth=1.0,label=r"arr_c")
plt.plot( np.diff(arr_b,n=2), "g", linewidth=2.0,label=r"arr_b" )
plt.plot( np.diff(arr_a,n=2), "b", linewidth=2.0,label=r"arr_a" )
plt.legend()

The output is shown below enter image description here From this figure, we see that np.diff(arr_b,n=2) and np.diff(arr_c,n=2) are very noisy, which is far from the theoretical value (i.e., zero). Why does this happen?

Stephen Wong
  • 101
  • 12
  • If you want higher precision you should use mpmath module. – Ahmed AEK Sep 17 '22 at 14:43
  • @AhmedAEK Can you show me a simple example? – Stephen Wong Sep 17 '22 at 14:44
  • 1
    [check the documentation](https://mpmath.org/doc/current/calculus/differentiation.html) – Ahmed AEK Sep 17 '22 at 14:52
  • 1
    The min/max values of `np.diff(arr_b,2)` are +- 3.81e-6. Divide by 10**9 we get values 3.e-15. For doubles that is virtually 0. `arr_b[1]` is `1018036072.1442885`. If we generated 501 values (instead of 500), `b[1]` displays as `1018000000.0`, and the min/max diffs are `0`. So the noise is mostly due to how `linspace` creates 500 numbers - that combined with the limited precision of float point values. – hpaulj Sep 17 '22 at 15:30
  • Look at a smaller `arr_b = np.linspace(0,10,10)` and its diff. Looking at the raw numbers should tell you more than a plot of 500. – hpaulj Sep 17 '22 at 15:33

0 Answers0