I've encountered a weird phenomenon when I compute the dot product of matrices using numpy
. In short, for mathematically identical computations, numpy
is giving me different results.
The following code snippet illustrates the issue:
import numpy as np
np.set_printoptions(precision=10)
np.random.seed(2) # for reproducibility
# create matrices:
A = np.random.rand(300, 1)
A = A/np.sum(A)
A = np.repeat(A, 15, 1)
B = np.random.rand(300, 300)
# convert data to float32:
A_star = A.astype("float32")
B_star = B.astype("float32")
# do the matrix multiplication A'BA and take the diagonal:
res_star = np.diag(A_star.transpose().dot(B_star.dot(A_star)))
# print the results:
print(res_star)
Running this in python3.5
with numpy1.11.1
on a Windows machine prints the following array:
[0.5000683069 0.5000683069 0.5000683069 0.5000683069 0.5000683069
0.5000683069 0.5000683069 0.5000683069 0.5000683069 0.5000683069
0.5000683069 0.5000683069 0.5000681877 0.5000681877 0.5000683069]
Please note that the values in res_star[12:14]
are different from the other elements of the array - although mathematically one would expect them to coincide.
My own inquiries into these differences weren't particularly sucessful, but I think I narrowed it down a little:
- with dtype
float64
the values in the resulting array are identical - in case of
float32
values, differences in the values typically only start at the 7th decimal place (the region wherefloat32
precision ends)
Yet, to me these findings can't really explain the differences within the array since the same mathematical operation is taking place (they do explain differences in the values if I were to compute the same dot products on the same matrices with dtype float64
). Also, for different seeds and matrix sizes the resulting values might actually coincide - so this behavior is not consistent.
- on a UNIX machine the code snippet given above returns identical figures (at least in this particular configuration)
Now,
1) what is causing this behavior and
2) how can I ensure identical results within the array without changing the dtype to float64
?