0

One of the NumPy array flags is OWNDATA, which the documentation describes:

OWNDATA (O)
    The array owns the memory it uses or borrows it from another object.

I was wondering if there is any use at all for this flag, at least as a piece of information in the public API. There are some questions mentioning this flags, like How can I tell if NumPy creates a view or a copy? or Numpy reshape on view, which suggest that OWNDATA should generally not be used to determine whether an array is a copy or a view. But I have not found cases where the value of the flag is actually useful.

I was thinking about it with an example like this:

import numpy as np
a = np.tile([1], 3)
print(a)
# [1 1 1]
print(a.flags)
#   C_CONTIGUOUS : True
#   F_CONTIGUOUS : True
#   OWNDATA : False
#   WRITEABLE : True
#   ALIGNED : True
#   WRITEBACKIFCOPY : False
#   UPDATEIFCOPY : False

np.tile returns a new contiguous array containing the tiled input. In the example, a is indeed contiguous, but OWNDATA is False. Turns out the reason is that there is a reshape at the end of np.tile, so technically the data is owned by another array that was later reshaped into the result of the function. However, I have no references to that array, and in every respect I should consider a as owner of its data. I imagine if np.tile was natively implemented, maybe OWNDATA would be True. However, I don't know (and shouldn't know) which NumPy functions are native or not, so it seems to me that OWNDATA does not give any useful information to end users of the library. I'm not familiar with NumPy memory management and there is probably a reason to have that information internally, but I'm not so sure about having it as a (potentially misleading) publicly accessible array flag.

Does anyone know about any actual, practical use of the OWNDATA flag?

EDIT: For clarification, I know that the value of OWNDATA is not related to the fact that the function that generates the array is native (compiled) or not. What I meant is that, while the array returned by tf.tile does, functionally, owns its data (since the actual owner of the data cannot be accessed anymore), the value of OWNDATA does not reflect that, and that, maybe, a compiled implementation of the function which didn't use intermediate ndarray objects might return an array with OWNDATA set to True. The point was that different implementation details may lead to different values of OWNDATA on otherwise functionally equivalent arrays, so it is not clear what the value of the flag OWNDATA is supposed to represent for a library user or how it may be useful.

jdehesa
  • 58,456
  • 7
  • 77
  • 121

1 Answers1

1

I don't look at flags nearly as much as a __array_interface__ (esp. its data key).

Whether a method/function is 'native' (compiled?) has nothing to do with OWNDATA.

In [16]: np.arange(12).flags['OWNDATA']                                                        
Out[16]: True
In [17]: np.arange(12).reshape(3,4).flags['OWNDATA']                                           
Out[17]: False
In [18]: np.arange(12).reshape(3,4).copy().flags['OWNDATA']                                    
Out[18]: True

reshape is fast compiled code, but it returns a view, a new array with its own shape and strides, but referencing the arange data buffer. That 1d arange array still exists even though I never assigned it to a variable.

The copy makes a new array with its own data. That copy is more expensive than the reshape, and not usually not needed - unless I need to ensure full independence between arrays.

We can illustrate the consequence(s) of OWNDATA with:

In [19]: x = np.arange(12)                                                                     
In [20]: y = x.reshape(3,4)                                                                    
In [21]: z = y.copy()                                                                          
In [22]: z[0,:] *= 10                                                                          
In [23]: z                                                                                     
Out[23]: 
array([[ 0, 10, 20, 30],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [24]: x                  # no change                                                                                     
Out[24]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
In [25]: y[0,:] *= 10                                                                          
In [26]: y                                                                                     
Out[26]: 
array([[ 0, 10, 20, 30],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [27]: x                 # changing y changed x                                                                    
Out[27]: array([ 0, 10, 20, 30,  4,  5,  6,  7,  8,  9, 10, 11])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Sorry, I think I wasn't clear in that part. I know being a "native" (I meant C-compiled) function does not directly tell anything about `OWNDATA`. What I meant is that maybe, if `tf.tile` were a compiled function that does all the work, instead of a composition of other NumPy functions, maybe it would return an array with `OWNDATA` set to True (since there would never have been any Python ndarray objects from which it was built). My point was that, functionally, `tf.tile` returns an array which owns the data, but the flag `OWNDATA` does not reflect that due to implementation details. – jdehesa Mar 12 '20 at 15:33
  • 1
    They could have used `c.shape=shape_out; return c`. `np.tile(...).base` shows `c` array before the `reshape`. I suppose `tile` could have been implemented in a way that didn't require a reshape. From code comments, they do worry about whether the result is a copy of `A` or not, but the setting of the OWNDATA flag is evidently not important. – hpaulj Mar 12 '20 at 16:09
  • https://stackoverflow.com/q/15424211/901925. The `resize` method is picky about ownership. A search of SO for OWNDATA doesn't produce a whole lot. Functions that return a reshape of a copy or unnamed array are common. – hpaulj Mar 12 '20 at 20:44
  • 1
    Yes, and I looked through NumPy source code and I don't think there is any Python code where it is actually used? So my impression is that any possible benefit knowing whether an array owns the data is voided by the fact that owners may not even exist as Python objects anymore (or at least accessible ones). I suppose that preventing that would require ndarray's `__del__` transferring ownership to one of its views, but that looks quite tricky to get perfectly right, and probably not worth the effort anyway... – jdehesa Mar 13 '20 at 10:04