Numpy is calculating wrong

Question

I am using numpy like this code

>>> import numpy as np
>>> a = np.arange(1, 100000001).sum()
>>> a
987459712

I guess the result must be some like 5000000050000000

I noticed that until five numbers the result is ok. Does someone knows what is happened?

Cannot reproduce in python 2.7 with numpy@1.11.2 or python3.5 and numpy@1.12.0. What are you using? — Jblasco, Apr 04 '17 at 14:10
Also, changing the title of the question to something more precise would be a good idea... — Jblasco, Apr 04 '17 at 14:11
Yes, if you do `a=np.arange(1,100000001).sum()` it gives `5000000050000000` as result — , Apr 04 '17 at 14:16

score 10 · Answer 1 · edited May 23 '17 at 12:32

10

Numpy is not doing a mistake here. This phenomenon is known as integer overflow.

x = np.arange(1,100000001)
print(x.sum())  # 987459712
print(x.dtype)  # dtype('int32')

The 32 bit integer type used in arange for the given input simply cannot hold 5000000050000000. At most it can take 2147483647.

If you explicitly use a larger integer or floating point data type you get the expected result.

a = np.arange(1, 100000001, dtype='int64').sum()
print(a)  # 5000000050000000

a = np.arange(1.0, 100000001.0).sum()
print(a)  # 5000000050000000.0

edited May 23 '17 at 12:32

Community

1
1

answered Apr 04 '17 at 14:25

MB-F

22,770
4
61
116

Sounds like a "mistake" to me, especially in Python where long integers are used as necessary. Sure `numpy` does its own arithmetic, but as a numeric package it should be better at dealing with computations, not worse. – alexis Apr 04 '17 at 15:02
@alexis I don't think this is a mistake because it is documented behavior. The docs say that by default `arange` infers the data type from the input. Since `100000001` is small enough it seems reasonable to use `int32`. Actually, since I work on 32bit Python and use integer arrays primarily for indexing I do appreciate that it uses a pointer-sized data type by default. – MB-F Apr 04 '17 at 15:31

Warren Weckesser · Answer 2 · 2017-04-04T14:32:07.073

I suspect you are using Windows, where the data type of the result is a 32 bit integer (while for those using, say, Mac OS X or Linux, the data type is 64 bit). Note that 5000000050000000 % (2**32) = 987459712

Try using

a = np.arange(1, 100000001, dtype=np.int64).sum()

or

a = np.arange(1, 100000001).sum(dtype=np.int64)

P.S. Anyone not using Windows can reproduce the result as follows:

>>> np.arange(1, 100000001).sum(dtype=np.int32)
987459712

Numpy is calculating wrong

2 Answers2

Linked

Related