1

I am using numpy to do the always fun "count the triangles in an adjacency matrix" task. (Given an nxn Adjacency matrix, how can one compute the number of triangles in the graph (Matlab)?)

Given my matrix A, numpy.matmul() computes the cube of A without problem, but for a large matrix numpy.trace() returns a negative number.

I extracted the diagonal using numpy.diagonal() and summed the entries using math.sum() and also using a for loop -- both returned the same negative number as numpy.trace().

An attempt with math.fsum() finally returned (the assumably correct) number 4,088,103,618 -- a seemingly small number for both python and for my 64-bit operating system, especially since python documents claim integer values are unlimited.

Surely this is an overflow or undefined behavior issue, but where does the inconsistency come from? I have performed the test on the following post to successfully validate my system architecture as 64 bit, and therefore numpy should also be a 64 bit package. Do I have Numpy 32 bit or 64 bit?

To visualize the summation process print statements were added to the for-loop, output appears as follows with an asterisk marking the interesting line.

.
.
.
adding diag val 2013124 to the running total 2140898426 = 2142911550
adding diag val 2043358 to the running total 2142911550 = 2144954908
adding diag val 2035410 to the running total 2144954908 = 2146990318
adding diag val 2000416 to the running total 2146990318 = -2145976562 *
adding diag val 2062276 to the running total -2145976562 = -2143914286
adding diag val 2092890 to the running total -2143914286 = -2141821396
adding diag val 2092854 to the running total -2141821396 = -2139728542
.
.
.

Why would adding 2000416 to 2146990318 create an overflow? The sum is only 2148990734 -- a very small number for python!

Community
  • 1
  • 1
learnedOnPascal
  • 103
  • 2
  • 12

2 Answers2

1

Numpy doesn't use the "python types" but rather underlying C types which you have to specify that meets your needs. By default, an array of integers will be given the "int_" type which from the docs:

int_ Default integer type (same as C long; normally either int64 or int32)

Hence why you're seeing the overflow. You'll have to specify some other type when you construct your array so that it doesn't overflow.

MasterOdin
  • 7,117
  • 1
  • 20
  • 35
0

When you do the addition with scalars you probably get a Warning:

>>> import numpy as np
>>> np.int32(2146990318) + np.int32(2035410)
RuntimeWarning: overflow encountered in long_scalars
-2145941568

So yes, it is overflow related. The maximum 32-bit integer is 2.147.483.647!

To make sure your arrays support a bigger range of values you could cast the array (I assume you operate on an array) to int64 (or a floating point value):

array = array.astype('int64')  # makes sure the values are 64 bit integers

or when creating the array:

import numpy as np
array = np.array(something, dtype=np.int64)

NumPy uses fixed-size integers and these aren't arbitary precision integers. By default it's either a 32 bit integer or a 64 bit integer, which one depends on your system. For example Windows uses int32 even when python + numpy is compiled for 64-bit.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • It will use whatever the size a C long is on the platform. Which on windows, even on 64bit machines, [will be 32 bits](https://msdn.microsoft.com/en-us/library/windows/desktop/aa383751(v=vs.85).aspx). – juanpa.arrivillaga May 13 '17 at 18:37
  • @learnedOnPascal No problem, please don't forget to [accept](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) the answer that solved your problem (if any answer did). – MSeifert May 13 '17 at 18:51
  • @juanpa.arrivillaga You're right, but the question did not specify the OS so I only included it as a additional information. – MSeifert May 13 '17 at 18:53