Why does np.astype('uint8') give different results on Windows versus Mac?

Question

I have a (1000,1000,3) shaped numpy array (dtype='float32') and when I cast it to dtype='uint8' I get different results on Windows versus Mac.

Array is available here: https://www.dropbox.com/s/jrs4n2ayh86s0fn/image.npy?dl=0

On Mac

>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
167942490

On Windows

>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
323510676

Also reproduces with this array:

import numpy as np
X = np.array([
[[46410., 42585., 32640.],
 [45645., 41820., 31875.],
 [45390., 41310., 32130.]],

[[44880., 41055., 31110.],
 [44115., 40290., 30345.],
 [46410., 42330., 33150.]],

[[45390., 41310., 32130.],
 [46155., 42075., 32895.],
 [42840., 38760., 30090.]]], dtype=np.float32)

print(X.sum(), X.astype('uint8').sum())

Prints 1065135.0 2735 on Windows and 1065135.0 1860 on Mac.

Here are results with different OS and Python and Numpy:

Python 3.8.8  (Win) Numpy 1.22.4 => 1065135.0 2735 
Python 3.10.6 (Mac) Numpy 1.24.2 => 1065135.0 2735 
Python 3.7.12 (Mac) Numpy 1.21.6 => 1065135.0 1860

Can you reproduce this with a smaller array that you can post here, so we don't have to download from dropbox? — Barmar, Mar 03 '23 at 22:47
I don't have a Windows machine. But I tried it on Mac and Linux, and they both printed `1065135.0 2735` — Barmar, Mar 03 '23 at 22:57
For a small array like this you can just do `print(X, X.astype('uint8'), sep='\n')` and compare the arrays directly. — Barmar, Mar 03 '23 at 22:58
For the small, 27-element array, did you actually observe different results on Mac and Windows, or are you just looking at the difference between `X.sum()` and `X.astype('uint8').sum()` and being surprised by that? Because uint8 cannot hold values that big, so of course `X.astype('uint8').sum()` isn't going to give the same result as `X.sum()`. — user2357112, Mar 03 '23 at 22:59
@user2357112 Look at the original code blocks, it's comparing `X.sum()` after conversion between platforms. — Barmar, Mar 03 '23 at 23:00
They aren't meant to give the same result. `X.sum()` is the same on Mac and Windows, but `X.astype('uint8').sum()` difference. For the same `X`. — nickponline, Mar 03 '23 at 23:01
@Barmar: Yeah, but it wasn't clear whether the smaller repro was actually tested on both platforms at the time I was writing that comment. — user2357112, Mar 03 '23 at 23:01
I'm python 3.9.2, maybe it's a version difference? What version on Windows? — Barmar, Mar 03 '23 at 23:03
A result of 1860 is baffling. Seeing different results on Mac and Windows is understandable for the original code - `numpy.sum` is using a different dtype on Windows due to the different size of C `long`, so the sum is probably overflowing on Windows - but the 1860 does not look like it has anything to do with integer overflow. — user2357112, Mar 03 '23 at 23:04
`X.astype('uint8')` is `array([[[ 74, 89, 128], [ 77, 92, 131], [ 78, 94, 130]], [[ 80, 95, 134], [ 83, 98, 137], [ 74, 90, 126]], [[ 78, 94, 130], [ 75, 91, 127], [ 88, 104, 138]]], dtype=uint8)` — Barmar, Mar 03 '23 at 23:04
Yes! it produces a different result for Python 3.10.6 vs. Python 3.7.12 (both on Mac) — nickponline, Mar 03 '23 at 23:04
What does `X.astype('uint8')` give on the platform where you get 1860? — user2357112, Mar 03 '23 at 23:05
Python 3.8.8 (Windows) 1065135.0 2735 Python Python 3.7.12 (Mac) 1065135.0 1860 Python 3.10.6 (mac) 1065135.0 2735 — nickponline, Mar 03 '23 at 23:05
What's your NumPy version on the Python 3.7.12 setup? `import numpy; print(numpy.__version__)` — user2357112, Mar 03 '23 at 23:07
Or maybe it's the version of `numpy` look like isn't incorrect in 1.21.6 and correct in 1.22.4+ — nickponline, Mar 03 '23 at 23:08
Could you show us the results of `print(repr(X))` and `print(repr(X.astype('uint8')))` on the Python 3.7.12 setup? — user2357112, Mar 03 '23 at 23:19
(Checking the original numbers again, even with the big array, those are about an order of magnitude too low for the difference to be due to the Windows `long` size thing I was thinking about earlier - it's probably the same issue as with the small array.) — user2357112, Mar 03 '23 at 23:21
@user2357112 ```Python 3.7.12 Mac + 1.21.6 array([[[46410., 42585., 32640.], [45645., 41820., 31875.], [45390., 41310., 32130.]], [[44880., 41055., 31110.], [44115., 40290., 30345.], [46410., 42330., 33150.]], [[45390., 41310., 32130.], [46155., 42075., 32895.], [42840., 38760., 30090.]]], dtype=float32) array([[[ 0, 0, 255], [ 0, 0, 255], [ 0, 0, 255]], [[ 0, 0, 255], [ 0, 0, 255], [ 0, 0, 0]], [[ 0, 0, 255], [ 0, 0, 0],[ 88, 104, 138]]], dtype=uint8)``` — nickponline, Mar 03 '23 at 23:29
Okay, something's clearly going wrong with the type conversion. The last 3 elements are fine, but everything else has been corrupted somehow. I didn't see any relevant bug fixes in the changelog, and my tests show no such corruption with NumPy 1.21.6 on Linux. This is really weird. — user2357112, Mar 03 '23 at 23:32

score 5 · Accepted Answer · answered Mar 04 '23 at 00:46

This problem is due to a bad conversion causing integer overflows. Indeed, Numpy use C casts so to convert values, but converting floats outside the range 0-255 to 8-bit unsigned integers results in an undefined behaviour in C. We tried to do our best to report errors in this case without impacting performance but this is not possible in all cases. The latest versions of Numpy should fix this but the issue is still partially unsolved. See the 1.24.0 release notes, this issue and this one, as well as this PR (AFAIK, the first reference to this issue is found here).

Anyway, while the error may not be detected on your target machine, casting floating-point number outside the range 0-255 is unsafe and you should not expect a correct result. You need to adapt your code so there is no overflow in the first place. I also advise you to use at least the version 1.24.0 of Numpy so to better track such errors.

Related post: Why does numpy handle overflows inconsistently?

Why does np.astype('uint8') give different results on Windows versus Mac?

1 Answers1

Linked