5

I'm confused about the results of numpy reshape operated on a view. In the following q.flags shows that it does not own the data, but q.base is neither x nor y, so what is it? I'm surprised to see that q.strides is 8 which means that it gets the next element by every time move 8 bytes in memory (if I understand correctly). However if none of the arrays other than x owns data, the only data buffer is from x, which does not permit getting the next element of q by moving 8 bytes.

In [99]: x = np.random.rand(4, 4)

In [100]: y = x.T

In [101]: q = y.reshape(16)

In [102]: q.base is y
Out[102]: False

In [103]: q.base is x
Out[103]: False

In [104]: y.flags
Out[104]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [105]: q.flags
Out[105]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [106]: q.strides
Out[106]: (8,)

In [107]: x
Out[107]: 
array([[ 0.62529694,  0.20813211,  0.73932923,  0.43183722],
       [ 0.09755023,  0.67082005,  0.78412615,  0.40307291],
       [ 0.2138691 ,  0.35191283,  0.57455781,  0.2449898 ],
       [ 0.36476299,  0.36590522,  0.24371933,  0.24837697]])

In [108]: q
Out[108]: 
array([ 0.62529694,  0.09755023,  0.2138691 ,  0.36476299,  0.20813211,
        0.67082005,  0.35191283,  0.36590522,  0.73932923,  0.78412615,
        0.57455781,  0.24371933,  0.43183722,  0.40307291,  0.2449898 ,
        0.24837697])

UPDATE:

It turns out that this question has been asked in the numpy discussion forum: http://numpy-discussion.10968.n7.nabble.com/OWNDATA-flag-and-reshape-views-vs-copies-td10363.html

shaoyl85
  • 1,854
  • 18
  • 30

2 Answers2

5

In short: you cannot always rely on the ndarray.flags['OWNDATA'].

>>> import numpy as np
>>> x = np.random.rand(2,2)
>>> y = x.T
>>> q = y.reshape(4)
>>> y[0,0]
0.86751629121019136
>>> y[0,0] = 1
>>> q
array([ 0.86751629,  0.87671107,  0.65239976,  0.41761267])
>>> x
array([[ 1.        ,  0.65239976],
       [ 0.87671107,  0.41761267]])
>>> y
array([[ 1.        ,  0.87671107],
       [ 0.65239976,  0.41761267]])
>>> y.flags['OWNDATA']
False
>>> x.flags['OWNDATA']
True
>>> q.flags['OWNDATA']
False
>>> np.may_share_memory(x,y)
True
>>> np.may_share_memory(x,q)
False

Because q didn't reflect the change in the first element, like x or y, it must somehow be the owner of the data (somehow is explained below).

There is more discussion about the OWNDATA flag over at the numpy-discussion mailinglist. In the How can I tell if NumPy creates a view or a copy? question, it is briefly mentioned that simply checking the flags.owndata of an ndarray sometimes seems to fail and that it seems unreliable, as you mention. That's because every ndarray also has a base attribute:

the base of an ndarray is a reference to another array if the memory originated elsewhere (otherwise, the base is None). The operation y.reshape(4) creates a copy, not a view, because the strides of y are (8,16). To get it reshaped (C-contiguous) to (4,), the memory pointer would have to jump 0->16->8->24, which is not doable with a single stride. Thus q.base points to the memory location generated by the forced-copy-operation y.reshape, which has the same shape as y, but copied elements and thus has normal strides again: (16, 8). q.base is thus not bound to by any other name as it was the result of the forced-copy operation y.reshape(4). Only now can the object q.base be viewed in a (4,) shape, because the strides allow this. q is then indeed a view on q.base.

For most people it would be confusing to see that q.flags.owndata is False, because, as shown above, it is not a view on y. However, it is a view on a copy of y. That copy, q.base, is the owner of the data however. Thus the flags are actually correct, if you inspect closely.

Community
  • 1
  • 1
Oliver W.
  • 13,169
  • 3
  • 37
  • 50
  • Thank you! Then is the q.base a copied array with the alignment needed for representing q, and then q is a view of it, but q.base is not binded to by any other python name? – shaoyl85 Mar 05 '15 at 21:09
  • I looked at the numpy-discussion post. It seems to me that when it refers to "owndata is not reliable", it is referring to whether the memory will be deallocated when the ndarray is to be deallocated. I guess for deciding whether owndata correctly reflects that the memory buffer used is the ndarray.data or it's ndarray.base.data, it is still an accurate indicator. It's just in this case the q array has a no-named base array which actually owns the data. Is my understanding correct? – shaoyl85 Mar 05 '15 at 21:23
  • @shaoyl85, you are correct. My reply to both your comments would've been too long to add in the comments, so I added it in the post. – Oliver W. Mar 05 '15 at 23:04
  • Thank you! I think this makes sense, and I guess the behavior of "creating a copy and then return a view" is a bit weird and unexpected. The more consistent behavior would be to just return a copy of the array when it is not possible to return a view of the input array. – shaoyl85 Mar 06 '15 at 03:40
3

I like to use .__array_interface__.

In [811]: x.__array_interface__
Out[811]: 
{'data': (149194496, False),
 'descr': [('', '<f8')],
 'shape': (4, 4),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

In [813]: y.__array_interface__
Out[813]: 
{'data': (149194496, False),
 'descr': [('', '<f8')],
 'shape': (4, 4),
 'strides': (8, 32),
 'typestr': '<f8',
 'version': 3}

In [814]: x.strides
Out[814]: (32, 8)
In [815]: y.strides
Out[815]: (8, 32)

Transpose was performed by reversing the strides. The base data pointer is the same.

In [817]: q.__array_interface__
Out[817]: 
{'data': (165219304, False),
 'descr': [('', '<f8')],
 'shape': (16,),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

So the q data is a copy (different pointer). Strides (8,) means its elements are accessed by stepping from one f8 to the next. But a x.reshape(16) is a view of x - because its data can be accessed with a simple 8 step.

To access the original data in the q order, it would have to step 32 bytes 3 times (down x rows), then go back to the start and step 8 to the 2nd x column, followed by 3 row steps, etc. Since striding doesn't work this way, it has to work from a copy.

Note also that y[0,0] changes x[0,0], but q[0] is independent of both.

While OWNDATA for q is false, it is True for y.ravel() and y.flatten(). I suspect reshape() in this case is making a copy, and then reshaping, and it's the intermediate copy that 'owns' the data, q.base.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thank you! Are you aware of any rationale of "making a copy then reshaping" rather than just "return a copy when not possible to return a view"? – shaoyl85 Mar 06 '15 at 03:41