0

Let's consider very basic example:

import numpy as np
recon_x = ([int(x) for x in np.linspace(1, 12288, num = 12288)])
recon_x = np.array([recon_x]).reshape(-1, 12288)

x = ([int(x) for x in np.linspace(230, 13000, num = 12288)])
x = np.array([x]).reshape(-1, 12288)

I want to calculate the sum of squared differences between x and recon_x: I want to do this by code:

np.sum((x - recon_x) ** 2)

But it returns wrong result:

-1341621451

which of course is incorrect, since sum of squares cannot be negative. Do you see why it happens?

John
  • 1,849
  • 2
  • 13
  • 23
  • 1
    Arithmetic overflow. The quantity is too big to fit into `numpy` int and overflows, in the process setting the sign bit. – BoarGules Apr 23 '22 at 11:31
  • 1
    Consider specifying the `dtype` of array so to avoid this. This is a good practice to avoid overflow and improve performance. – Jérôme Richard Apr 23 '22 at 11:33
  • The fun thing is, since you're converting everything to a list of integers (before turning that into a NumPy array again), doing the last calculation in pure Python would get you the correct answer: `sum((item1 - item2)**2 for item1, item2 in zip(recon_x, x))` (remove the two lines with reshape), which is 2953345845, since Python can handle big integers. – 9769953 Apr 23 '22 at 11:40
  • 2
    Of course, `np.array([x]).reshape(-1, 12288)` seems very odd. Why not use `np.array(x)`? Better yet, why not use `x = np.linspace(1, 12288, num=12288, dtype=np.int64)` and remove the indirect conversion to integers and reshaping altogether? – 9769953 Apr 23 '22 at 11:43

1 Answers1

1

This an overflow issue. You can precise the type of the numpy arrays with dtype to be sure to use higher precision number format.

import numpy as np
recon_x = ([int(x) for x in np.linspace(1, 12288, num = 12288)])
recon_x = np.array([recon_x], dtype="float64").reshape(-1, 12288)

x = ([int(x) for x in np.linspace(230, 13000, num = 12288)])
x = np.array([x], dtype="float64").reshape(-1, 12288)
ZiGaelle
  • 744
  • 1
  • 9
  • 21