numpy - why mean and SD are unstable for the same value?

Question

Question

Why the same value -3.29686744 results in different mean and standard deviation?

Expected

X = np.array([
    [-1.11793447, -3.29686744, -3.50615096],
    [-1.11793447, -3.29686744, -3.50615096],
    [-1.11793447, -3.29686744, -3.50615096]
])

mean = np.mean(X, axis=0)
print(f"mean is \n{mean}\nX-mean is \n{X-mean}\n")

sd = np.std(X, axis=0)
print(f"SD is \n{sd}\n")

Result:

mean is 
[-1.11793447 -3.29686744 -3.50615096]
X-mean is 
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

SD is 
[0. 0. 0.]

Unexpected

X = np.array([
    [-1.11793447, -3.29686744, -3.50615096],
    [-1.11793447, -3.29686744, -3.50615096],
    [-1.11793447, -3.29686744, -3.50615096],
    [-1.11793447, -3.29686744, -3.50615096],
    [-1.11793447, -3.29686744, -3.50615096]
])

mean = np.mean(X, axis=0)
print(f"mean is \n{mean}\nX-mean is \n{X-mean}\n")

sd = np.std(X, axis=0)
print(f"SD is \n{sd}\n")

Result is:

mean is 
[-1.11793447 -3.29686744 -3.50615096]
X-mean is 
[[0.0000000e+00 4.4408921e-16 4.4408921e-16]
 [0.0000000e+00 4.4408921e-16 4.4408921e-16]
 [0.0000000e+00 4.4408921e-16 4.4408921e-16]
 [0.0000000e+00 4.4408921e-16 4.4408921e-16]
 [0.0000000e+00 4.4408921e-16 4.4408921e-16]]

SD is 
[0.0000000e+00 4.4408921e-16 4.4408921e-16]

1 ulp added up because you took the average of more items – Mad Physicist Mar 21 '21 at 03:55 — Mad Physicist, Mar 21 '21 at 03:55

Mad Physicist · Answer 1 · 2021-03-21T07:28:13.570

This is normal behavior when you consider that IEEE-754 double precision floats are stored as 64 bits of data. 53 bits are mantissa 10 bits are exponent, and one bit is sign. You can look up the details elsewhere.

The important part is that floats are effectively stored as integers with a scale factor. This is the binary analog of scientific notation. In fact, you can intuit exactly what is happening using more familiar decimal scientific notation.

Let's say you have three digits of decimal precision available, and you want to compute the mean of [2.31e2, 2.31e2, 2.31e2]. The sum is 6.93e2, and so the mean is unambiguously 2.31e2. But what if your array was [2.31e2, 2.31e2, 2.31e2, 2.31e2, 2.31e2]. Now the sum is 1.155e3, but with only three digits available, the best you can do is 1.15e3 or 1.16e3 depending on whether you truncate or round. Dividing by five and truncating/rounding gives you either 2.30e2 or 2.32e2. There will generally be some quantization error the moment your sum has a scale different from your original number.

Hopefully you can see that this translates directly to binary representations as well: you are seeing the differences in the last digit from the scale change during the mean operation.

Notice that 2^-53 ~= 1.11e-16. Given that the scale of all elements in X is about 1, this corresponds very well to the quantization error you are seeing.

This is very closely related to Is floating point math broken?

numpy - why mean and SD are unstable for the same value?

Question

Expected

Unexpected

1 Answers1