3

I have many large multidimensional NP arrays (2D and 3D) used in an algorithm. There are numerous iterations in this, and during each iteration the arrays are recalculated by performing calculations and saving into temporary arrays of the same size. At the end of a single iteration the contents of the temporary arrays are copied into the actual data arrays.

Example:

global A, B # ndarrays
A_temp = numpy.zeros(A.shape)
B_temp = numpy.zeros(B.shape)
for i in xrange(num_iters):
    # Calculate new values from A and B storing in A_temp and B_temp...
    # Then copy values from temps to A and B
    A[:] = A_temp
    B[:] = B_temp

This works fine, however it seems a bit wasteful to copy all those values when A and B could just swap. The following would swap the arrays:

A, A_temp = A_temp, A
B, B_temp = B_temp, B

However there can be other references to the arrays in other scopes which this won't change.

It seems like NumPy could have an internal method for swapping the internal data pointer of two arrays, such as numpy.swap(A, A_temp). Then all variables pointing to A would be pointing to the changed data.

coderforlife
  • 1,378
  • 18
  • 31
  • Can you give an example of "other references to the arrays in other scopes which this won't change"? – dmytro Mar 31 '12 at 09:49
  • For example, for the calculation step I could (and will) have multiple threads, called with a function where the fields are passed as arguments. To reduce the overhead of instantiating multiple threads they cycle as well. – coderforlife Apr 01 '12 at 03:57
  • if I recall correctly, you can't pass the data between threads/processes without copy unless you're using specific features like sharedctypes arrays or so. But I might be wrong... – dmytro Apr 01 '12 at 22:04
  • Sharing between threads is not difficult, in fact does not require any extra work. It just requires you to be extra careful about race conditions (and since I am only reading in the loop and only one thread is writing the arrays this is not really an issue). For multiprocessing some additional work is required, but the Scipy site says "It is possible to share memory between processes, including numpy arrays." (http://www.scipy.org/ParallelProgramming). – coderforlife Apr 04 '12 at 04:05
  • indeed, you need to use sharedctypes arrays for that (I only worked with multiprocessing so I ain't sure about threading of course) – dmytro Apr 06 '12 at 13:17

3 Answers3

2

Even though you way should work as good (I suspect the problem is somewhere else), you can try doing it explicitly:

import numpy as np
A, A_temp = np.frombuffer(A_temp), np.frombuffer(A)

It's not hard to verify that your method works as well:

>>> import numpy as np
>>> arr = np.zeros(100)
>>> arr2 = np.ones(100)
>>> print arr.__array_interface__['data'][0], arr2.__array_interface__['data'][0]
152523144 152228040

>>> arr, arr2 = arr2, arr
>>> print arr.__array_interface__['data'][0], arr2.__array_interface__['data'][0]
152228040 152523144

... pointers succsessfully switched

dmytro
  • 1,293
  • 9
  • 21
  • The frombuffer idea is interesting, however it does require a "reshape(A.shape)" command since it only returns 1D arrays. Besides that though this does not solve the issue with alternate references however. I tried a few different things with it, and what is needed a "setbuffer" function... This has given me the idea to look into views though (which I don't really know too much about). – coderforlife Apr 01 '12 at 04:19
1

I realize this is an old question, but for what it's worth you could also swap data between two ndarray buffers (without a temp copy) by performing an xor swap:

A_bytes = A.view('ubyte')
A_temp_bytes = A.view('ubyte')
A_bytes ^= A_temp_bytes
A_temp_bytes ^= A_bytes
A_bytes ^= A_temp_bytes

Since this was done on views, if you look at the original A and A_temp arrays (in whatever their original dtype was) their values should be correctly swapped. This is basically equivalent to the numpy.swap(A, A_temp) you were looking for. It's unfortunate that it requires 3 loops--if this were implemented as a ufunc (maybe it should be) it would be a lot faster.

Iguananaut
  • 21,810
  • 5
  • 50
  • 63
1

Perhaps you could solve this by adding a level of indirection.

You could have an "array holder" class. All that would do is keep a reference to the underlying NumPy array. Implementing a cheap swap operation for a pair of these would be trivial.

If all external references are to these holder objects and not directly to the arrays, none of those references would get invalidated by a swap.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • This is definitely a viable alternative. It just seems silly since the ndarray already has a pointer to the proper data that could be changed... if I cannot figure out a way using views this will be the answer. – coderforlife Apr 01 '12 at 04:21
  • @thaimin: The Pythonic way to swap two things is `a, b = b, a`. This swaps the references. I can't think of a single standard class that implements a swap interface like the one you're looking for. It's just not done in Python since it's almost never needed. – NPE Apr 01 '12 at 06:24
  • I understand. I attempted to use views which would work if it let you re-assign the "base" field. I attempted to assign the "data" field which almost worked but doesn't allow swapping (although the first assignment does exactly what I want and a.data, b.data = b.data, a.data does swap "one" of them). – coderforlife Apr 04 '12 at 04:07