I tried to normalize a probability distribution of the form $p_k := 2^{-k^2}$ for $k \in {1,\dots,n}$ for $n = 8$ in numpy/Python 3.8 along the following lines, using an equivalent of MATLAB's num2hex a la C++ / Python Equivalent of Matlab's num2hex. The sums of the normalized distributions differ in Python and MATLAB R2020a.
If $n < 8$, there is no discrepancy.
What is going on, and how can I force Python to produce the same result as MATLAB for $n > 7$? It's hard for me to tell which of these is IEEE 754 compliant (maybe both, with a discrepancy in grouping that affects a carry[?]) but I need the MATLAB behavior.
I note that there are still discrepancies in rounding between numpy and MATLAB per Differences between Matlab and Numpy and Python's `round` function (which I verified myself) but not sure this has any bearing.
import numpy as np
import struct # for MATLAB num2hex equivalent below
n = 8
unnormalizedPDF = np.array([2**-(k**2) for k in range(1,n+1)])
# MATLAB num2hex equivalent a la https://stackoverflow.com/questions/24790722/
num2hex = lambda x : hex(struct.unpack('!q', struct.pack('!d',x))[0])
hexPDF = [num2hex(unnormalizedPDF[k]/np.sum(unnormalizedPDF)) for k in range(0,n)]
print(hexPDF)
# ['0x3fec5862805436a4',
# '0x3fbc5862805436a4',
# '0x3f6c5862805436a4',
# '0x3efc5862805436a4',
# '0x3e6c5862805436a4',
# '0x3dbc5862805436a4',
# '0x3cec5862805436a4',
# '0x3bfc5862805436a4']
hexPDFSum = num2hex(np.sum(unnormalizedPDF/np.sum(unnormalizedPDF)))
print(hexPDFSum)
# 0x3ff0000000000000
Here is the equivalent in MATLAB:
n = 8;
unnormalizedPDF = 2.^-((1:n).^2);
num2hex(unnormalizedPDF/sum(unnormalizedPDF))
% ans =
%
% 8×16 char array
%
% '3fec5862805436a4'
% '3fbc5862805436a4'
% '3f6c5862805436a4'
% '3efc5862805436a4'
% '3e6c5862805436a4'
% '3dbc5862805436a4'
% '3cec5862805436a4'
% '3bfc5862805436a4'
num2hex(sum(unnormalizedPDF/sum(unnormalizedPDF)))
% ans =
%
% '3fefffffffffffff'
Note that the unnormalized distributions are exactly the same, but the sums of their normalizations differ by a single bit. If I use $n = 7$, everything agrees (both give 0x3fefffffffffffff), and both give the same results for $n < 7$ as well.