3

I'm comparing the numbers generated with np.random.normal on two different systems (details see below) with the following code (I'm using the legacy np.random.seed because this is used by another program whose output I want to verify eventually) (1) :

import numpy as np

np.random.seed(0)
x = np.random.normal(scale=1e-3, size=10**5)
np.save('test.npy', x)

Then I copy the test.npy from one system to the other where I compare the two versions:

>>> other = np.load('test.npy')
>>> (x != other).sum(), len(x)
(29, 100000)
>>> mask = x != other
>>> np.abs(x[mask] - other[mask])
array([5.42101086e-20, 1.35525272e-20, 2.71050543e-20, 5.42101086e-20,
       1.08420217e-19, 1.08420217e-19, 2.16840434e-19, 2.16840434e-19,
       1.35525272e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20,
       2.71050543e-20, 1.08420217e-19, 2.16840434e-19, 5.42101086e-20,
       2.71050543e-20, 2.16840434e-19, 2.16840434e-19, 2.71050543e-20,
       2.71050543e-20, 1.08420217e-19, 1.08420217e-19, 1.08420217e-19,
       5.42101086e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20,
       2.71050543e-20])
>>> x[mask]
array([ 4.52489093e-04,  9.78961454e-05, -1.47113076e-04, -3.67859222e-04,
       -5.33279620e-04,  8.40794952e-04, -7.75987295e-04,  1.34205479e-03,
        6.34459482e-05,  5.07109360e-04, -7.68363366e-04,  3.33350262e-04,
       -2.19367067e-04,  6.11402140e-04, -1.30486526e-03, -4.42699624e-04,
        1.45463287e-04, -1.22491651e-03,  1.05226781e-03, -2.43032730e-04,
       -2.40551279e-04,  4.95396595e-04, -7.25454745e-04, -8.50779215e-04,
       -2.66274662e-04,  7.28854386e-04,  8.38515107e-04,  3.36152654e-04,
       -1.26550328e-04])

So there are 29 out of 100,000 elements that differ by a tiny amount. However, I do not understand where this difference comes from. I verified that I have the same version of Python and NumPy installed on both systems: python==3.9.4 and numpy==1.20.2 (obtained via python -m pip install numpy==1.20.2; but I also checked with the latest version numpy==1.23.0 and the results are exactly the same). I verified that the RNG state (via np.random.get_state()) is the same on both systems before and after the call to np.random.normal. I saved and copied the test.npy file multiple times and I also verified it via MD5 checksum, so the difference must originate in the random number generation itself (1). However, I can't see how this is possible, given that both are initiated with the same random state.

Information about systems

System A (the one where test.npy is saved):

$ uname -a
Linux SystemA 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

(I also tested on another system A2 which has the same OS version installed as A but is equipped with different CPUs, but the results didn't change from A to A2, i.e. I suspect it's the OS version).

System B (the one where test.npy is loaded):

$ uname -a
Linux SystemB 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Footnote (1): When I use the recommend way given in the docs of np.random.seed, i.e. rs = RandomState(MT19937(SeedSequence(0))), I find that the discrepancy between the two systems remains. However, when I use np.random.default_rng(seed=0) instead, i.e. the new PCG64, I find that the discrepancy goes away.

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • In case anyone else is confused by the [output](https://stackoverflow.com/questions/6943803/understanding-uname-output) of `uname -a` as well. – Michael Szczesny Jun 30 '22 at 14:50
  • Perhaps it is due to the properties of floating point numbers? See https://stackoverflow.com/questions/30065437/floating-point-math-in-python-numpy-not-reproducible-across-machines – Gediminas Jun 30 '22 at 17:01
  • Related (not intended to answer this question): https://stackoverflow.com/questions/69761837/in-r-how-can-i-get-prng-to-give-identical-floating-point-numbers-between-platfo – Peter O. Jun 30 '22 at 22:35

1 Answers1

3

Given that the differences are all so small, it suggests that the underlying bit-generators are doing the same things. It's just to do with differences between the underlying math library.

NumPy's legacy generator uses sqrt and log from libm, and you can see that it pulls in these symbols by first finding the shared object providing the generator via:

import numpy as np

print(np.random.mtrand.__file__)

then dumping symbols with:

nm -C -gD mtrand.*.so | grep GLIBC

where that mtrand filename comes from the above output.

I get a lot of other symbols output, but that might explain the differences.

At a guess it's to do with the log implementation, so you could test with:

import numpy as np

np.random.seed(0)

x = 2 * np.random.rand(2, 10**5) - 1
r2 = np.sum(x * x, axis=0)

np.save('test-log.npy', np.log(r2))

and compare between these two systems.

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • This seems to be the reason, indeed. I compared `r2` (also made sure that `0 < r2 < 1` is satisfied) and the numbers agree between the two systems. Then I compare `np.log(r2)` and these do disagree. It's interesting that for the resulting normal values only such a small fraction is affected; is there a way to see the implementation of `libm` functions? And how can I find out the `libm` versions? I checked `ldconfig -p | grep libm\\.` and it gives me `libm.so.6 (libc6,x86-64, OS ABI: Linux 2.6.32)` vs. `libm.so.6 (libc6,x86-64, OS ABI: Linux 3.2.0)`; are those the library version numbers? – a_guest Jun 30 '22 at 19:16
  • However, how would that explain that for `PCG64` there are no differences? Is it just "luck" for the `10**5` samples I created? I see that the new version of `random_normal` uses [`npy_log1p`](https://github.com/numpy/numpy/blob/b0b523ec0980e46ca735ce96c7c976c4fdd28424/numpy/random/src/distributions/distributions.c#L158) but that presumably uses `log` under the hood, too? Also, it uses [`exp`](https://github.com/numpy/numpy/blob/b0b523ec0980e46ca735ce96c7c976c4fdd28424/numpy/random/src/distributions/distributions.c#L166) which likely has different implementations, too (?). – a_guest Jun 30 '22 at 20:24
  • That may be because `PCG64` is part of a new system in NumPy for pseudorandom generation, since version 1.17. See also [this question](https://stackoverflow.com/questions/60599149/what-is-the-difference-between-numpy-randoms-generator-class-and-np-random-meth/60609480#60609480). Also the differences in your question may be due to implementation differences in floating-point numbers; see also [this question](https://stackoverflow.com/questions/69761837/in-r-how-can-i-get-prng-to-give-identical-floating-point-numbers-between-platfo) involving the R language. – Peter O. Jun 30 '22 at 22:40
  • @a_guest determining where code comes from is very system specific. I'd suggest asking your distribution's package manager (e.g. under debian/ubuntu you'd use [dpkg](https://unix.stackexchange.com/q/224186/90376)) and then search for the source code associated with that version of package. – Sam Mason Jul 01 '22 at 08:29