45

Consider the following brief numpy session showcasing uint64 data type

import numpy as np
 
a = np.zeros(1,np.uint64)
 
a
# array([0], dtype=uint64)
 
a[0] -= 1
a
# array([18446744073709551615], dtype=uint64)
# this is 0xffff ffff ffff ffff, as expected

a[0] -= 1
a
# array([0], dtype=uint64)
# what the heck?

I'm utterly confused by this last output.

I would expect 0xFFFF'FFFF'FFFF'FFFE.

What exactly is going on here?

My setup:

>>> sys.platform
'linux'
>>> sys.version
'3.10.5 (main, Jul 20 2022, 08:58:47) [GCC 7.5.0]'
>>> np.version.version
'1.23.1'
Albert.Lang
  • 523
  • 3
  • 8
  • 5
    Interestingly I get a `OverflowError: long too big to convert` for the second decrement. What version of numpy is this, and in what environment? – Dan Mašek Apr 30 '23 at 16:53
  • 3
    Does it matter if you make the 1 an np array of uint64 type also? – ProfDFrancis Apr 30 '23 at 16:55
  • 4
    Same as Dan, I get `OverflowError: Python int too large to convert to C long` – njzk2 Apr 30 '23 at 16:55
  • 4
    `b = np.zeros(1,np.uint64); b[0] = 1; a - b` give, as one may expect, `array([18446744073709551614], dtype=uint64)`. Similarly `a[0] -= np.uint64(1)` also works. I suspect the `-=` is allowing a conversion to the right-hand side-value type – njzk2 Apr 30 '23 at 16:56
  • 5
    I'm seeing the same issue as OP. Running 3.11 on MacOS. `a[0] -= 1` leaves the value at zero, while `a -= 1` sets the value to FF..FF. – Frank Yellin Apr 30 '23 at 16:59
  • 3
    I got some strange result as well, but not the same one: 0, 0xffffffffffffffff (good so far), then 0x8000000000000000 (???) – harold Apr 30 '23 at 17:03
  • 1
    @ProfDFrancis see comment by njzk2. numpy scalars like `one = np.uint64(1)` also work as expected, `a[0] -= one` even gives an overflow warning at the first subtraction. – Albert.Lang Apr 30 '23 at 17:22
  • 3
    Even `a[0] -= 0` does the same for me. – Kelly Bundy Apr 30 '23 at 18:09
  • 2
    I got the same result as @harold. After a little testing, I found that `a[0] = 1152921504606847360; a[0] -= 1` results in `1152921504606847488` ( it is increasing). Comparing the bit representations of these numbers, it appears that they are truncated at the upper 52 bits after subtraction. Since this is the same length as the float64 fractional bits, this is probably what is happening. – ken Apr 30 '23 at 18:28
  • 2
    @harold: Now that there are answers explaining that floating point conversion is involved, it makes me wonder if you're seeing hardware behaviour. Out-of-range FP to signed-int conversion on x86 produced `0x8000...` (INT_MIN / LONG_MIN), the "indefinite integer" value (https://www.felixcloutier.com/x86/cvtsd2si). AVX-512 FP to unsigned int conversion produces 0xFFFF... for out-of-range inputs. (https://www.felixcloutier.com/x86/vcvtsd2usi). – Peter Cordes May 01 '23 at 14:36
  • @harold: IDK if either result could be explained that way, or if Python range-checks would come into play. Or if not actual AVX-512 hardware, then perhaps a Python compiled with MSVC's FP -> uint64 conversion routines which define the result of out-of-range conversions to match what AVX-512 hardware does: https://devblogs.microsoft.com/cppblog/microsoft-visual-studio-2022-and-floating-point-to-integer-conversions/ . The conversion that gives `0xffff...` is a float with value `-1` which is indeed out-of-range for `uint64_t`, but the 2nd conversion is also out-of-range for `uint64_t`. – Peter Cordes May 01 '23 at 14:38

5 Answers5

40

By default, NumPy converts Python int objects to numpy.int_, a signed integer dtype corresponding to C long. (This decision was made back in the early days when Python int also corresponded to C long.)

There is no integer dtype big enough to hold all values of numpy.uint64 dtype and numpy.int_ dtype, so operations between numpy.uint64 scalars and Python int objects produce float64 results instead of integer results. (Operations between uint64 arrays and Python ints may behave differently, as the int is converted to a dtype based on its value in such operations, but a[0] is a scalar.)

Your first subtraction produces a float64 with value -1, and your second subtraction produces a float64 with value 2**64 (since float64 doesn't have enough precision to perform the subtraction exactly). Both of these values are out of range for uint64 dtype, so converting back to uint64 for the assignment to a[0] produces undefined behavior (inherited from C - NumPy just uses a C cast).

On your machine, this happened to produce wraparound behavior, so -1 wrapped around to 18446744073709551615 and 2**64 wrapped around to 0, but that's not a guarantee. You might see different behavior on other setups. People in the comments did see different behavior.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 10
    I am curious, as someone who only uses numpy occasionally, does the community feel this is a defect? This is incredibly surprising since most languages with explicitly-specified fixed with types have their behaviour defined similarly or identically to C. – Chuu May 01 '23 at 13:13
  • 3
    @Chuu: I dunno about the general community, but I'd personally prefer if mixing uint64 and signed integers produced uint64 output instead of float64 output. (C handles this specific case better, but it's got its own issues - for example, integer arithmetic on two unsigned operands can produce *signed* overflow and undefined behavior in C, due to how unsigned types smaller than int get promoted to signed int. I'm not sure if NumPy has any protections in place to avoid this issue.) – user2357112 May 01 '23 at 13:30
  • 4
    Isn’t inheriting UB from C rather worrying, given that the C standard allows a program that invokes UB to do literally anything? Admittedly, most (all?) of the fun stuff that has surfaced lately, like not generating any code at all for an execution path that always invokes UB, hinges on the UB being detected at compile time, which this won’t be, but still... – Ture Pålsson May 01 '23 at 14:56
  • 1
    @TurePålsson: "Isn’t inheriting UB from C rather worrying" - absolutely! The behavior we're seeing here already sucks, but it could be arbitrarily worse some day, and if it does get worse, there's no telling how long the problem might pass unnoticed. – user2357112 May 01 '23 at 15:09
  • @TurePålsson: Yup. Fortunately real-world C implementations will produce a value for out-of-range FP to int conversions (at least when the the value being converted is a run-time variable that might be in range), Some compilers even go as far as officially defining the behaviour, such as MSVC: https://devblogs.microsoft.com/cppblog/microsoft-visual-studio-2022-and-floating-point-to-integer-conversions/ (article re: changing the out-of-range conversion result for FP-to-unsigned to match AVX-512 instructions which produce all-ones, 0xfff... in that case.) – Peter Cordes May 01 '23 at 18:48
  • @TurePålsson: A C implementation would be *allowed* to implement FP to integer conversions in a way that crashes on out-of-bounds floats, but I think most users would consider that undesirable. OTOH, current implementations for current CPUs would have to go out of their way to crash; the thing that's most efficient for well-defined cases will just product some value for out-of-range. Maybe not always the same value for every out-of-range input, especially not inputs that are in-range for `int64_t` vs. ones that aren't. (e.g. modern x86 without AVX-512 can only do FP to int64_t natively) – Peter Cordes May 01 '23 at 18:53
  • @PeterCordes: By the same argument, I think most users would view as undesirable code generation which would cause a function like `unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFF; }` to arbitrarily corrupt memory if `x` exceeds `INT_MAX/y`, but that doesn't mean gcc isn't designed to occasionally behave in such fashion anyhow. – supercat May 01 '23 at 22:39
  • The third paragraph implies that float64 doesn't have enough precision to subtract one from negative one (`-1-1`). Any floating point system can do this calculation exactly. Please clarify. – Joooeey May 07 '23 at 06:39
  • @Joooeey: float64 does not have enough precision to exactly subtract 1 from 2**64-1. We are not subtracting 1 from -1. – user2357112 May 07 '23 at 07:54
  • That doesn't clarify anything. Where does 2**64-1 come from and when is 1 subtracted from that number? Please edit your answer to make it more clear. – Joooeey May 07 '23 at 07:58
  • Isn't the float64 with value -1 converted to uint64 before the second subtraction? – Joooeey May 07 '23 at 08:10
  • Yes, and the conversion to uint64 produces undefined behavior, which in this case manifested as a result of 2**64-1. Then we try to subtract 1 from that. – user2357112 May 07 '23 at 08:16
  • Now I see that this is in the last paragraph. I see that the rationale for organizing your paragraphs is to separate numpy implementation details from environment implementation details but that makes your answer hard to follow. – Joooeey May 07 '23 at 08:54
12

a[0] - 1 is 1.8446744073709552e+19, a numpy.float64. That can't retain all the precision, so its value is 18446744073709551616=264. Which, when written back into a with dtype np.uint64, becomes 0.

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • 10
    NumPy floating->integer conversion with a floating point value out of the integer type's range is undefined behavior (inherited from C), so there's no guarantee the result is actually 0 - it could be anything. – user2357112 May 01 '23 at 01:37
  • 1
    @user2357112 Yeah I had actually written a second paragraph about that, also noting harold's different result, but then decided to keep this brief, just explaining the observed behavior of the OP. – Kelly Bundy May 01 '23 at 12:05
  • Doing what OP did produces a problem due to implicit type conversions. If you do `a[0] -= np.uint64(1)` you get a RuntimeWarning: overflow encountered in scalar subtract. Doing `a[0] = a[0] - np.uint64(1)` produces the expected result. – JamieSee May 02 '23 at 20:25
  • @JamieSee No, I get no warning/error with that, and the number becomes 18446744073709551614. [Demo](https://ato.pxeger.com/run?1=m72soLIkIz9vwYKlpSVpuhY3azJzC_KLShTySnMLKhUSixXyCrgUuBIVbIEMvarUovxiDUMdILM0M6_EzESTq6AIyNAoSi0o0kjU1OTiSow2iFXQtVUwxCkD16xhiK4d4gaoU2BOAgA). – Kelly Bundy May 02 '23 at 20:38
  • @KellyBundy Interesting. What version of Python and Numpy? I tried this using Python 3.11.0 and numpy 1.24.3. – JamieSee May 03 '23 at 21:37
  • @JamieSee You can try it there yourself. [3.10.6 and 1.24.1](https://ato.pxeger.com/run?1=m72soLIkIz9vwYKlpSVpuhY3N2TmFuQXlSgUVxZzQZl5pbkFlQqJxQp5BVxcBUWZeSUaQFm9stSi4sz8PE2oUF6BXnw8VCw-XpOLK1HBFqhDryq1KL9Yw1AHyCwFqjMzgWkoSi0o0kjUBKmMNohV0LVVMMQpA9esYYiuHeJuqPNh3gAA). – Kelly Bundy May 03 '23 at 21:58
  • FWIW, SageMathCell gives the same result, both in Python & Sage modes https://sagecell.sagemath.org/?z=eJzLzC3ILypRyCvNLahUSCxWyCvg5VLg5UpUsAUy9apSi_KLNQx1gMzSzLwSMxNNXq6CIiBLoyi1oEgjURPIByqONohV0LVVMMQnCTdCwxDDEABf3SZh&lang=python – PM 2Ring May 04 '23 at 06:56
0

All the existing answers are correct. I just want to add on Windows 10 I got a different result, namely 9223372036854775808.

Steps to reproduce:

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.13.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: a = np.zeros(1,np.uint64)

In [3]: a
Out[3]: array([0], dtype=uint64)

In [4]: a[0] -= 1

In [5]: a
Out[5]: array([18446744073709551615], dtype=uint64)

In [6]: a[0] - 1
Out[6]: 1.8446744073709552e+19

In [7]: a[0] - 1 == 2**64
Out[7]: True

In [8]: a[0] -= 1
<ipython-input-8-9ab639258820>:1: RuntimeWarning: invalid value encountered in cast
  a[0] -= 1

In [9]: a
Out[9]: array([9223372036854775808], dtype=uint64)

In [10]: f'{a[0]:b}'
Out[10]: '1000000000000000000000000000000000000000000000000000000000000000'

In [11]: len(_)
Out[11]: 64

In [12]: a[0] == 2**63
Out[12]: True

In [13]: a[0] - 1
Out[13]: 9.223372036854776e+18

In [14]: a[0] - 1 == 2 ** 63
Out[14]: True

In [15]: a[0] -= 1

In [16]: a[0]
Out[16]: 9223372036854775808

In [17]: np.version.version
Out[17]: '1.24.2'

In binary increment by one will change the last bit from zero to one and one to zero, and going from one to zero will change the bit before the last bit, this will keep carry to the left until the leftmost bit goes from zero to one.

In unit64 if you want to subtract one from zero, the number zero can't get any smaller so it is treated as 2^64, and subtract one from it you get 2^64-1, which in binary is '1'*64 and 18446744073709551615 in decimal.

In [6]: a[0] - 1
Out[6]: 1.8446744073709552e+19

In [7]: a[0] - 1 == 2**64
Out[7]: True

Then when the value is operated with a Python int it is converted to a float 1.8446744073709552e+19 which because of the limitation of the format, is actually 2^64.

In [8]: a[0] -= 1
<ipython-input-8-9ab639258820>:1: RuntimeWarning: invalid value encountered in cast
  a[0] -= 1

In [9]: a
Out[9]: array([9223372036854775808], dtype=uint64)

Now this gets interesting, the maximum value uint64 can hold is 2^64 - 1, because 2 ^ 64 is one followed by 64 zeros in binary, so it can't be presented as is in uint64, it is in this case converted to zero before the decrement, as the last 64 bits in 2^64 are zeros.

That's why there is an warning.

But when doing the calculation, somehow it is converted to signed int64, and then converted to uint64 again.

The calculated result is -1, when stored in signed int64 form, is '1'+'0'*63 because the leftmost bit is used for the sign, and the number is negative if the sign bit is set.

Because one bit is used for the sign the maximum value of int64 is 2^63-1 which is 9223372036854775807 in decimal.

When the number negative one in int64 is converted to uint64 it is treated as 2^63 which is 9223372036854775808 in decimal, because the number holds a numerical value of 2^63.

Then the number stays there no matter how many decrements I do, because when the operations happen the uint64 type is converted to a float, which has a value of 2^63, and decrement by one cannot change that value.

Ξένη Γήινος
  • 2,181
  • 1
  • 9
  • 35
0

Possible workarounds

1. Explicit cast

a[0] -= np.uint64(1)

++

  • clean
  • fast

--

  • cumbersome

2. Fancy indexing

a[[0]] -= 1

+

  • easy to type

--

  • slow

3. Slice indexing

a[0:1] -= 1

-

  • mildly cumbersome
  • not the fastest
Albert.Lang
  • 523
  • 3
  • 8
-2

The behavior you are seeing is due to how unsigned integer arithmetic works in numpy. When an unsigned integer is decremented, if the result is negative, it "wraps around" to the maximum value of the data type.

In your example, a[0] starts at the value 0xFFFFFFFFFFFFFFFF, which is the maximum value for a 64-bit unsigned integer. When you subtract 1 from it, the result is 0xFFFFFFFFFFFFFFFE, as you expected. However, when you subtract 1 from it again, the result is -1 (which is represented as 0xFFFFFFFFFFFFFFFF in binary). Since this value is negative, it wraps around to the maximum value of the data type, which is 0.

So, the behavior you are seeing is expected due to the properties of unsigned integer arithmetic. If you want to avoid this behavior, you can use a signed integer data type instead.

JohnD9191
  • 9
  • 1
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 07 '23 at 19:30