10

I am using numpy like this code

>>> import numpy as np
>>> a = np.arange(1, 100000001).sum()
>>> a
987459712

I guess the result must be some like 5000000050000000

I noticed that until five numbers the result is ok. Does someone knows what is happened?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • Cannot reproduce in python 2.7 with numpy@1.11.2 or python3.5 and numpy@1.12.0. What are you using? – Jblasco Apr 04 '17 at 14:10
  • Also, changing the title of the question to something more precise would be a good idea... – Jblasco Apr 04 '17 at 14:11
  • Yes, if you do `a=np.arange(1,100000001).sum()` it gives `5000000050000000` as result –  Apr 04 '17 at 14:16

2 Answers2

10

Numpy is not doing a mistake here. This phenomenon is known as integer overflow.

x = np.arange(1,100000001)
print(x.sum())  # 987459712
print(x.dtype)  # dtype('int32')

The 32 bit integer type used in arange for the given input simply cannot hold 5000000050000000. At most it can take 2147483647.

If you explicitly use a larger integer or floating point data type you get the expected result.

a = np.arange(1, 100000001, dtype='int64').sum()
print(a)  # 5000000050000000

a = np.arange(1.0, 100000001.0).sum()
print(a)  # 5000000050000000.0
Community
  • 1
  • 1
MB-F
  • 22,770
  • 4
  • 61
  • 116
  • Sounds like a "mistake" to me, especially in Python where long integers are used as necessary. Sure `numpy` does its own arithmetic, but as a numeric package it should be better at dealing with computations, not worse. – alexis Apr 04 '17 at 15:02
  • @alexis I don't think this is a mistake because it is documented behavior. The docs say that by default `arange` infers the data type from the input. Since `100000001` is small enough it seems reasonable to use `int32`. Actually, since I work on 32bit Python and use integer arrays primarily for indexing I do appreciate that it uses a pointer-sized data type by default. – MB-F Apr 04 '17 at 15:31
7

I suspect you are using Windows, where the data type of the result is a 32 bit integer (while for those using, say, Mac OS X or Linux, the data type is 64 bit). Note that 5000000050000000 % (2**32) = 987459712

Try using

a = np.arange(1, 100000001, dtype=np.int64).sum()

or

a = np.arange(1, 100000001).sum(dtype=np.int64)

P.S. Anyone not using Windows can reproduce the result as follows:

>>> np.arange(1, 100000001).sum(dtype=np.int32)
987459712
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214