2

I have a (1000,1000,3) shaped numpy array (dtype='float32') and when I cast it to dtype='uint8' I get different results on Windows versus Mac.

Array is available here: https://www.dropbox.com/s/jrs4n2ayh86s0fn/image.npy?dl=0

On Mac

>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
167942490

On Windows

>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
323510676

Also reproduces with this array:

import numpy as np
X = np.array([
[[46410., 42585., 32640.],
 [45645., 41820., 31875.],
 [45390., 41310., 32130.]],

[[44880., 41055., 31110.],
 [44115., 40290., 30345.],
 [46410., 42330., 33150.]],

[[45390., 41310., 32130.],
 [46155., 42075., 32895.],
 [42840., 38760., 30090.]]], dtype=np.float32)

print(X.sum(), X.astype('uint8').sum())

Prints 1065135.0 2735 on Windows and 1065135.0 1860 on Mac.

Here are results with different OS and Python and Numpy:

Python 3.8.8  (Win) Numpy 1.22.4 => 1065135.0 2735 
Python 3.10.6 (Mac) Numpy 1.24.2 => 1065135.0 2735 
Python 3.7.12 (Mac) Numpy 1.21.6 => 1065135.0 1860 
nickponline
  • 25,354
  • 32
  • 99
  • 167
  • 7
    Can you reproduce this with a smaller array that you can post here, so we don't have to download from dropbox? – Barmar Mar 03 '23 at 22:47
  • @Barmar I added a smaller example. – nickponline Mar 03 '23 at 22:52
  • I don't have a Windows machine. But I tried it on Mac and Linux, and they both printed `1065135.0 2735` – Barmar Mar 03 '23 at 22:57
  • 4
    For a small array like this you can just do `print(X, X.astype('uint8'), sep='\n')` and compare the arrays directly. – Barmar Mar 03 '23 at 22:58
  • @Barmar on my mac it prints: `1065135.0 1860` – nickponline Mar 03 '23 at 22:59
  • For the small, 27-element array, did you actually observe different results on Mac and Windows, or are you just looking at the difference between `X.sum()` and `X.astype('uint8').sum()` and being surprised by that? Because uint8 cannot hold values that big, so of course `X.astype('uint8').sum()` isn't going to give the same result as `X.sum()`. – user2357112 Mar 03 '23 at 22:59
  • @user2357112 Look at the original code blocks, it's comparing `X.sum()` after conversion between platforms. – Barmar Mar 03 '23 at 23:00
  • They aren't meant to give the same result. `X.sum()` is the same on Mac and Windows, but `X.astype('uint8').sum()` difference. For the same `X`. – nickponline Mar 03 '23 at 23:01
  • @Barmar: Yeah, but it wasn't clear whether the smaller repro was actually tested on both platforms at the time I was writing that comment. – user2357112 Mar 03 '23 at 23:01
  • I'm on a M1 Mac if it makes a difference. – Barmar Mar 03 '23 at 23:02
  • I'm also on Mac M1 Python 3.7.12 - am I losing my mind? – nickponline Mar 03 '23 at 23:03
  • I'm python 3.9.2, maybe it's a version difference? What version on Windows? – Barmar Mar 03 '23 at 23:03
  • 1
    A result of 1860 is baffling. Seeing different results on Mac and Windows is understandable for the original code - `numpy.sum` is using a different dtype on Windows due to the different size of C `long`, so the sum is probably overflowing on Windows - but the 1860 does not look like it has anything to do with integer overflow. – user2357112 Mar 03 '23 at 23:04
  • `X.astype('uint8')` is `array([[[ 74, 89, 128], [ 77, 92, 131], [ 78, 94, 130]], [[ 80, 95, 134], [ 83, 98, 137], [ 74, 90, 126]], [[ 78, 94, 130], [ 75, 91, 127], [ 88, 104, 138]]], dtype=uint8)` – Barmar Mar 03 '23 at 23:04
  • Yes! it produces a different result for Python 3.10.6 vs. Python 3.7.12 (both on Mac) – nickponline Mar 03 '23 at 23:04
  • 3
    What does `X.astype('uint8')` give on the platform where you get 1860? – user2357112 Mar 03 '23 at 23:05
  • Windows version is Python 3.8.8 – nickponline Mar 03 '23 at 23:05
  • Sounds like a bug that was fixed in 3.8. – Barmar Mar 03 '23 at 23:05
  • Python 3.8.8 (Windows) 1065135.0 2735 Python Python 3.7.12 (Mac) 1065135.0 1860 Python 3.10.6 (mac) 1065135.0 2735 – nickponline Mar 03 '23 at 23:05
  • What's your NumPy version on the Python 3.7.12 setup? `import numpy; print(numpy.__version__)` – user2357112 Mar 03 '23 at 23:07
  • Or maybe it's the version of `numpy` look like isn't incorrect in 1.21.6 and correct in 1.22.4+ – nickponline Mar 03 '23 at 23:08
  • Added different versions in the edit above. – nickponline Mar 03 '23 at 23:10
  • Could you show us the results of `print(repr(X))` and `print(repr(X.astype('uint8')))` on the Python 3.7.12 setup? – user2357112 Mar 03 '23 at 23:19
  • (Checking the original numbers again, even with the big array, those are about an order of magnitude too low for the difference to be due to the Windows `long` size thing I was thinking about earlier - it's probably the same issue as with the small array.) – user2357112 Mar 03 '23 at 23:21
  • @user2357112 ```Python 3.7.12 Mac + 1.21.6 array([[[46410., 42585., 32640.], [45645., 41820., 31875.], [45390., 41310., 32130.]], [[44880., 41055., 31110.], [44115., 40290., 30345.], [46410., 42330., 33150.]], [[45390., 41310., 32130.], [46155., 42075., 32895.], [42840., 38760., 30090.]]], dtype=float32) array([[[ 0, 0, 255], [ 0, 0, 255], [ 0, 0, 255]], [[ 0, 0, 255], [ 0, 0, 255], [ 0, 0, 0]], [[ 0, 0, 255], [ 0, 0, 0],[ 88, 104, 138]]], dtype=uint8)``` – nickponline Mar 03 '23 at 23:29
  • Okay, something's clearly going wrong with the type conversion. The last 3 elements are fine, but everything else has been corrupted somehow. I didn't see any relevant bug fixes in the changelog, and my tests show no such corruption with NumPy 1.21.6 on Linux. This is really weird. – user2357112 Mar 03 '23 at 23:32
  • You want to use `(X%256).astype('uint8')`. – Guimoute Mar 04 '23 at 00:55

1 Answers1

5

This problem is due to a bad conversion causing integer overflows. Indeed, Numpy use C casts so to convert values, but converting floats outside the range 0-255 to 8-bit unsigned integers results in an undefined behaviour in C. We tried to do our best to report errors in this case without impacting performance but this is not possible in all cases. The latest versions of Numpy should fix this but the issue is still partially unsolved. See the 1.24.0 release notes, this issue and this one, as well as this PR (AFAIK, the first reference to this issue is found here).

Anyway, while the error may not be detected on your target machine, casting floating-point number outside the range 0-255 is unsafe and you should not expect a correct result. You need to adapt your code so there is no overflow in the first place. I also advise you to use at least the version 1.24.0 of Numpy so to better track such errors.

Related post: Why does numpy handle overflows inconsistently?

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59