Unexpected result with += on NumPy arrays

Question

I am creating symmetric matrices/arrays in Python with NumPy, using a standard method:

x = rand(500,500)
x = (x+x.T)
all(x==x.T)
> True

Now let's be clever:

x = rand(500,500)
x += x.T
all(x==x.T)
> False

Wait, what?

x==x.T
> array([[ True,  True,  True, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False],
       ..., 
       [False, False, False, ...,  True,  True,  True],
       [False, False, False, ...,  True,  True,  True],
       [False, False, False, ...,  True,  True,  True]], dtype=bool)

The upper left and lower right segments are symmetrical. What if I chose a smaller array?

x = rand(50,50)
x += x.T
all(x==x.T)
> True

OK....

x = rand(90,90)
x += x.T
all(x==x.T)
> True

x = rand(91,91)
x += x.T
all(x==x.T)
> False

And just to be sure...

x = rand(91,91)
x = (x+x.T)
all(x==x.T)
> True

Is this a bug, or am I about to learn something crazy about += and NumPy arrays?

@AndrewJaffe this is Numpy 1.9 on Python 3.4.1, as distributed in Anaconda. — jeffalstott, Oct 09 '14 at 12:34
@jeffalstott yep, I misread the question -- I do see the behaviour too. — Andrew Jaffe, Oct 09 '14 at 12:37
see related: http://stackoverflow.com/questions/11524664/how-can-i-tell-if-numpy-creates-a-view-or-a-copy — EdChum, Oct 09 '14 at 13:00

jdehesa · Accepted Answer · 2014-10-09T12:37:19.820

The transpose operation returns a view of the array, which means that no new array is allocated. Which, in turn, means that you are reading and modifying the array at the same time. It's hard to tell why some sizes or some areas of the result work, but most likely it has to do with how numpy deals with array addition (maybe it makes copies of submatrices) and/or array views (maybe for small sizes it does create a new array).

The x = x + x.T operation works because there you are creating a new array and then assigning to x, of course.

score 20 · Answer 2 · answered Oct 09 '14 at 14:19

The implementation detail mentioned by others is called buffering. You can read more about it in the docs on array iteration.

If you look at your failing example in a little more detail:

>>> a = np.random.rand(91, 91)
>>> a += a.T
>>> a[:5, -1]
array([ 0.83818399,  1.06489316,  1.23675312,  0.00379798,  1.08967428])
>>> a[-1, :5]
array([ 0.83818399,  1.06489316,  1.75091827,  0.00416305,  1.76315071])

So the first wrong value is 90*91 + 2 = 8192, which, unsurprisingly, is what we get from:

>>> np.getbufsize()
8192

And we can also set it higher, and then:

>>> np.setbufsize(16384)  # Must be a multiple of 16
8192  # returns the previous buffer size
>>> a = np.random.rand(91, 91)
>>> a += a.T
>>> np.all(a == a.T)
True

Although now:

>>> a = np.random.rand(129, 129)
>>> a += a.T
>>> np.all(a == a.T)
False

This is of course a dangerous thing to rely on for correctness, as it is indeed an implementation detail subject to change.

score 4 · Answer 3 · answered Oct 09 '14 at 12:36

The problem is that the addition is not done "at once"; x.T is a view of the x so as you start adding to each element of x, x.T is mutated. This messes up later additions.

The fact it works for sizes below (91, 91) is almost definitely an implementation detail. Using

x = numpy.random.rand(1000, 1000)
x += x.T.copy()
numpy.all(x==x.T)
#>>> True

fixes that, but then you don't really have any space-benefit.

Unexpected result with += on NumPy arrays

3 Answers3

Linked

Related