3

Given the following code:

import numpy as np
c = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

c = np.array(c)
print((c * c.transpose()).prod())

On my windows machine it returns "-1462091776" (Not sure how it got a negative from all those positives). On ubuntu it returns "131681894400"

Anyone know what's going on here?

Edit: Apparently this is an overflow problem. (Thanks @rafaelc !) But it is reproducible (Also thanks to @richardec for testing that)

So now the question becomes.. is this a bug I should report? Who do I report it to?

Mr. Enigma
  • 33
  • 4

1 Answers1

3

I have enough comments that I think an "answer" is warranted.

What happened?

Not sure how it got a negative from all those positives

As @rafaelc points out, you ran into an integer overflow. You can read more details at the wikipedia link that was provided.

What caused the overflow?

According to this thread, numpy uses the operating system's C long type as the default dtype for integers. So when you write this line of code:

c = np.array(c)

The dtype defaults to numpy's default integer data type, which is the operating system's C long. The size of a long in Microsoft's C implementation for Windows is 4 bytes (x8 bits/byte = 32 bits), so your dtype defaults to a 32-bit integer.

Why did this calculation overflow?

In [1]: import numpy as np

In [2]: np.iinfo(np.int32)
Out[2]: iinfo(min=-2147483648, max=2147483647, dtype=int32)

The largest number a 32-bit, signed integer data type can represent is 2147483647. If you take a look at your product across just one axis:

In [5]: c * c.T
Out[5]:
array([[ 1,  8, 21],
       [ 8, 25, 48],
       [21, 48, 81]])

In [6]: (c * c.T).prod(axis=0)
Out[6]: array([  168,  9600, 81648])

In [7]: 168 * 9600 * 81648
Out[7]: 131681894400

You can see that 131681894400 >> 2147483647 (in mathematics, the notation >> means "is much, much larger"). Since 131681894400 is much larger than the maximum integer the 32-bit long can represent, an overflow occurs.

But it's fine in Linux

In Linux, a long is 8 bytes (x8 bits/byte = 64 bits). Why? Here's an SO thread that discusses this in the comments.

"Is it a bug?"

No, although it's pretty annoying, I'll admit.

For what it's worth, it's usually a good idea to be explicit about your data types, so next time:

c = np.array(c, dtype='int64')

# or
c = np.array(c, dtype=np.int64)

Who do I report a bug to?

Again, this isn't a bug, but if it were, you'd open an issue on the numpy github (where you can also peruse the source code). Somewhere in there is proof of how numpy uses the operating system's default C long, but I don't have it in me to go digging around to find it.

ddejohn
  • 8,775
  • 3
  • 17
  • 30