13

I have a dataset on which I'm trying to apply some arithmetical method. The thing is it gives me relatively large numbers, and when I do it with numpy, they're stocked as 0.

The weird thing is, when I compute the numbers appart, they have an int value, they only become zeros when I compute them using numpy.

x = np.array([18,30,31,31,15])
10*150**x[0]/x[0]
Out[1]:36298069767006890

vector = 10*150**x/x
vector
Out[2]: array([0, 0, 0, 0, 0])

I have off course checked their types:

type(10*150**x[0]/x[0]) == type(vector[0])
Out[3]:True

How can I compute this large numbers using numpy without seeing them turned into zeros?

Note that if we remove the factor 10 at the beggining the problem slitghly changes (but I think it might be a similar reason):

x = np.array([18,30,31,31,15])
150**x[0]/x[0]
Out[4]:311075541538526549

vector = 150**x/x
vector
Out[5]: array([-329406144173384851, -230584300921369396, 224960293581823801,
   -224960293581823801, -368934881474191033])

The negative numbers indicate the largest numbers of the int64 type in python as been crossed don't they?

ysearka
  • 3,805
  • 5
  • 20
  • 41
  • Could you use floating point numbers `np.array([18.0, 30, 31, 31, 15])` instead of int? – kennytm May 17 '16 at 09:17
  • 1
    No, do not use float values. They may appear to work but their precision will be horrible at those value ranges. Your computations work but the result is wrong (and you don't notice). – Nils Werner May 17 '16 at 09:21

2 Answers2

21

As Nils Werner already mentioned, numpy's native ctypes cannot save numbers that large, but python itself can since the int objects use an arbitrary length implementation. So what you can do is tell numpy not to convert the numbers to ctypes but use the python objects instead. This will be slower, but it will work.

In [14]: x = np.array([18,30,31,31,15], dtype=object)

In [15]: 150**x
Out[15]: 
array([1477891880035400390625000000000000000000L,
       191751059232884086668491363525390625000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       437893890380859375000000000000000L], dtype=object)

In this case the numpy array will not store the numbers themselves but references to the corresponding int objects. When you perform arithmetic operations they won't be performed on the numpy array but on the objects behind the references.
I think you're still able to use most of the numpy functions with this workaround but they will definitely be a lot slower than usual.

But that's what you get when you're dealing with numbers that large :D
Maybe somewhere out there is a library that can deal with this issue a little better.

Just for completeness, if precision is not an issue, you can also use floats:

In [19]: x = np.array([18,30,31,31,15], dtype=np.float64)

In [20]: 150**x
Out[20]: 
array([  1.47789188e+39,   1.91751059e+65,   2.87626589e+67,
         2.87626589e+67,   4.37893890e+32])
swenzel
  • 6,745
  • 3
  • 23
  • 37
  • 1
    Interesting approach to use a `numpy.array(dtype=object)`. Will keep that in mind. – Nils Werner May 17 '16 at 10:22
  • The dtype=object option seems like a good solution in general. In my case it might be a little more difficult since I then have to apply scipy.special functions such as psi (digamma function) which works on numpy.array but not with the dtype=object option. – ysearka May 17 '16 at 10:57
  • In general you can't count on numpy math operations to work with `dtype=object`. The fast operations use compiled code - code that works with various standard numeric datatypes. But with `object`, the array actually contains pointers - to objects else where in memory. In effect such an array is a glorified list (or debased one?). – hpaulj May 17 '16 at 23:59
3

150 ** 28 is way beyond what an int64 variable can represent (it's in the ballpark of 8e60 while the maximum possible value of an unsigned int64 is roughly 18e18).

Python may be using an arbitrary length integer implementation, but NumPy doesn't.

As you deduced correctly, negative numbers are a symptom of an int overflow.

Nils Werner
  • 34,832
  • 7
  • 76
  • 98
  • Then is there a way to give another length integer implementation to numpy? I could compute the numbers one after another using vanilla python but that would be very long, and I'd really rather avoid this. – ysearka May 17 '16 at 09:29
  • And it seems weird that the type of 150**x[0]/x[0] is showed as numpy.int64 if vanilla python doesn't use the same length integer implementation. Would it mean that it makes the computation in some type and then store it in another? – ysearka May 17 '16 at 09:32