1
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
>>> np.version.version
'1.17.2'

I am maintaining integers as a bit array in order to manipulate at bit level algorithmically. I am using a fairly standard way to convert from the bit array to an integer and then view in decimal format.

def decimal(binaryValue):
    decimalValue = 0
    for bit in binaryValue:
        decimalValue = (decimalValue << 1) | bit
        print(decimalValue) #For testing
    return decimalValue

I was weirdly receiving negative integers randomly for certain arrays 64-bit or larger. I realised after some hair pulling and crazy debugging that this happens when I use a NumPy array. No issue with regular lists. Here is a specific example:

>>> b = [1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1]
>>> decimal(b)
1
3
7
...
4418570559336253839
8837141118672507678
17674282237345015357
17674282237345015357
>>> import numpy as np
>>> b_np = np.array(b)
>>> b_np
array([1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0,
   1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,
   0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1])
>>> decimal(b_np)
1
3
7
...
4418570559336253839
8837141118672507678
-772461836364536259
-772461836364536259
>>> np.binary_repr(decimal(b))
'1111010101000111101010100000110101110000111100100010011000111101'
>>> np.binary_repr(decimal(b_np))
'-101010111000010101011111001010001111000011011101100111000011'

As you can see, with the numpy array representation, something happens on the last bit evaluation. If I convert the numpy array back to a list, then I now get a negative number! Very very weird. Something happens in the numpy space. But c is identical to b.

>>> c = list(b_np)
>>> c
[1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1]
>>> decimal(c)
1
3
7
...
4418570559336253839
8837141118672507678
-772461836364536259
-772461836364536259
>>>np_binary_rep(decimal(c))
'-101010111000010101011111001010001111000011011101100111000011'

Simple checks:

>>> len(b)
64
>>> len(b_np)
64
>>> len(c)
64
>>> b == c
True
>>> b == b_np
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

What is going on? Obviously a 64-bit issue, but can't see where. Thanks.

stochashtic
  • 63
  • 1
  • 6

1 Answers1

2

When you are trying to transform your b variable, your are assigning int data type to the decimal variable, but when you do it with a numpy.array datatype like b_np, you are assigning numpy.int64 datatype.

int type

The int type can represent an integer up to sys.maxsize * 2 + 1:

import sys
print(sys.maxsize * 2 + 1)
# Prints 18446744073709551615 (length 20)

This means that your first result 17674282237345015357 is representable by an unsigned int because 18446744073709551615 > 17674282237345015357 == True.

numpy.int64 type

This datatype can represent integers between -9223372036854775808 to 9223372036854775807, you can check here.
Since 9223372036854775807 (length 19) is smaller than your expected result 18446744073709551615 the 64th byte is taken as a sign, that's why only your last result gives a negative value. If you try the same example with an array with '62length (2**62` representable values), you would see that you won't have negative values.

Cblopez
  • 446
  • 2
  • 12
  • 2
    Python 3 `int` type allow calculations with arbitrary long numbers, so there is no need to check that it's `< sys.maxsize * 2 + 1`, but `numpy.int64` indeed has limitations. – V. Ayrat Apr 19 '20 at 12:19
  • Absolutely, but that calculation is the max word size, this post has a really interesting answer about that https://stackoverflow.com/questions/7604966/maximum-and-minimum-values-for-ints – Cblopez Apr 19 '20 at 12:21
  • [replacing previous comment] Thanks! I knew it was to do with a 64-bit limit, but couldn't at all see where. Since I set `decimalValue = 0`, which is of type Python 3 `int`, I didn't see the issue. Completely missed that `int | numpy.int64` is of type `numpy.int64`. I would have expected instead that the right operand be upcasted to `int'. I have switched my array to a standard Python list. It creates some inefficiencies downstream, but need > 64-bit precision for this purpose. – stochashtic Apr 24 '20 at 12:27