Python numpy.divmod and integer representation

Question

I was trying to use numpy.divmod with very large integers and I noticed a strange behaviour. At around 2**63 ~ 1e19 (which should be the limit for the usual memory representation of int in python 3.5+), this happens:

from numpy import divmod

test = 10**6
for i in range(15,25):
  x = 10**i
  print(i, divmod(x, test))

15 (1000000000, 0)
16 (10000000000, 0)
17 (100000000000, 0)
18 (1000000000000, 0)
19 (10000000000000.0, 0.0)
20 ((100000000000000, 0), None)
21 ((1000000000000000, 0), None)
22 ((10000000000000000, 0), None)
23 ((100000000000000000, 0), None)
24 ((1000000000000000000, 0), None)

Somehow, the quotient and remainder works fine till 2**63, then there's something different.

My guess is that the int representation is "vectorized" (i.e. as BigInt in Scala, as a little endian Seq of Long). But then, I'd expect, as a result of divmod(array, test), a pair of arrays: the array of quotients and the array of remainders.

I have no clue about this feature. It does not happen with the built-in divmod (everything works as expected)

Why does this happen? Does it have something to do with int internal representation?

Details: numpy version 1.13.1, python 3.6

I guess you should just file a [bug report](https://github.com/numpy/numpy/issues). It's not clear to me what your question is (aside from showcasing a probably buggy behavior). — MSeifert, Sep 21 '17 at 10:26
I added a more precise question. It might be a bug, but I wonder if there's a rational (and coherent) explanation — pazqo, Sep 21 '17 at 11:11

MSeifert · Accepted Answer · 2017-09-21T11:26:44.330

The problem is that np.divmod will convert the arguments to arrays and what happens is really easy:

>>> np.array(10**19)
array(10000000000000000000, dtype=uint64)
>>> np.array(10**20)
array(100000000000000000000, dtype=object)

You will get an object array for 10**i with i > 19, in the other cases it will be a "real NumPy array".

And, indeed, it seems like object arrays behave strangely with np.divmod:

>>> np.divmod(np.array(10**5, dtype=object), 10)   # smaller value but object array
((10000, 0), None)

I guess in this case the normal Python built-in divmod calculates the first returned element and all remaining items are filled with None because it delegated to Pythons function.

Note that object arrays often behave differently than native dtype arrays. They are a lot slower and often delegate to Python functions (which is often the reason for different results).

This is it! I was not expecting the cast to "object". They are slower indeed, infact divmod is 10x faster than np.divmod for those numbers (I guess the object casting adds overhead). Thank you! — pazqo, Sep 21 '17 at 11:28

Python numpy.divmod and integer representation

1 Answers1