function taking ownership of Numpy array

Question

I have a function takes_ownership() that performs an operation on a Numpy array a in place and effectively relies on becoming the owner of the data. Is there a way to alter the passed array such that it no longer points to its original buffer?

The motivation is a code that processes huge chunks of data in place (due to performance constraints) and a common error is unaware users recycling their input arrays.

Note that the question is very specifically not "why is a different if I change b = a in my function". The question is "how can I make using the given array in place safer for unsuspecting users" when I must not make a copy.

def takes_ownership(a):
    b = a

    # looking for this step
    a.set_internal_buffer([])

    # if this were C++ std::vectors, I would be looking for
    # b = np.array([])
    # a.swap(b)

    # an expensive operation that invalidates a
    b.resize((6, 6), refcheck=False)

    # outside references to a no longer valid
    return b


a = np.random.randn(5, 5)

b = takes_ownership(a)

# array no longer has data so that users cannot mess up
assert a.shape = ()

Checking back on this question an hour later, I don't believe there is a clean way to do what you want. I don't have experience with numpy, but [this](https://stackoverflow.com/questions/9954413/swap-array-data-in-numpy) appears to answer your question. Basically, have your arrays in a wrapper, then exchange out the arrays in the wrapper. Numpy doesn't appear to expose a way of simply swapping the underlying structure it holds. — Carcigenicate, Jun 10 '21 at 21:38

score -1 · Answer 1 · answered Jun 10 '21 at 20:37

-1

NumPy has a copy function that will clone an array (although if this is an object array, i.e. not primitive, there might still be nested object references after cloning). That being said, this is a questionable design pattern and it would probably be better practice to rewrite your code in a way that does not rely on this condition.

Edit: If you can't copy the array, you will probably need to make sure that none of your other code modifies it in place (instead running immutable operations to produce new arrays). Doing things this way may cause more issues down the road, so I'd recommend refactoring so that it is not necessary.

answered Jun 10 '21 at 20:37

generic_stackoverflow_user

316
3
10

The question is precisely how to "make sure that none of your other code modifies it". – ntessore Jun 10 '21 at 20:52
Right, but you cannot operate on a "different" array unless you (a) copy the whole array or (b) use an immutable operation that returns a new array. If you copy the array correctly, I can't see it being a major detractor from your program's performance - still, as [Carcigenicate said](https://stackoverflow.com/questions/67927952/function-taking-ownership-of-numpy-array/67928147?noredirect=1#comment120065368_67927952), this is a rather strange way of implementing whatever functionality you are trying to design. – generic_stackoverflow_user Jun 10 '21 at 23:05

score -1 · Answer 2 · answered Jun 10 '21 at 20:44

You can use the numpy's copy, which will absolutely do what you want.

Use the code b = np.copy(a) and no further changes are needed.

If you want to make a copy of an object, in general, so that you can call methods on that object then you can use the copy module.

From the linked page:

A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

In this case, in your code import copy and then use b = copy.copy(a) and you will get a shallow copy (Which I think should be good enough for numpy arrays, but you'll want to check that yourself).

The hanging question here is why this is needed. Python prefers to use "pass by reference" when calling a function. The assignment operator = does not actually call any constructor for the object on the left-hand side of the operator, rather it assigns it to the reference of the object on the right-hand side. So, when you call a method on a reference (using the dot operator .), either a or b are going to call the same object in memory unless you make a new object with an explicit copy command.

This does not address the question, which is not "how do I copy `a`". — ntessore, Jun 10 '21 at 20:52
I feel like this did address the question prior to the edit. But, what you probably need is the proper vocabulary to ask the question more clearly. I recommend you check this out: https://www.tutorialsteacher.com/python/public-private-protected-modifiers — VoNWooDSoN, Jun 24 '21 at 21:02

function taking ownership of Numpy array

2 Answers2