efficiency of Hadamard on list of arrays

Question

The numpy, a Hadmard product on a vertical stack of 1D arrays is significantly faster than looping through the list of 1D arrays and performing the Hadamard (element-wise) product on each (which makes sense, and I tested it anyways).

I have a situation in which I need to perform the Hadamard product between one set of numpy arrays and another as such:

stacked_arrays = np.vstack([1D-arrays...])
stacked_arrays *= np.power(factor, np.arange(1, num_arrays))

However, I need this operation to mutate each component 1D array in list, and this operation needs to happen a lot. I know this sounds like a weird feature, but is there any way to do this without a loop like:

factors = factor ** np.arange(1, num_arrays)
for array, f in zip([1D..arrays], factors):
     array *= f

or without unstacking the result of the operation?

Also map can't be used since map creates copies of the numpy arrays as such:

result = map(lambda x, y: x * y, zip([1D..arrays], factors))

since you can't do *= with lambda a list of numpy arrays is returned leaving the originals unmutated.

Is there a way to get np.vstack to still reference the old component arrays somehow, or an alternative way to achieve the speed of the Hadamard product between arrays that are stacked while mutating the unstacked ones? Since some time can be saved if unstacking (np.split) doesn't need to occur.

TimeIt results:

m = []
for _ in range(100):
    m.append(np.array([1, 2, 4, 5], dtype=np.float64))
factors = np.expand_dims(np.power(2, np.arange(100, dtype=np.float64)), axis=1)

def split_and_unstack():
    l = np.vstack(m)
    l *= factors
    result = np.split(l, 100)

def split_only():
    l = np.vstack(m)
    l *= factors

print(timeit.timeit(split_and_unstack, number=10000))
# 1.8569015570101328

print(timeit.timeit(split_only, number=10000))
# 0.9328480050317012

# makes sense that with unstacking it is about double the time

Clarification: The list of [1D arrays] mentioned above is a sublist of a larger list of 1D arrays. This larger list is a collections.deque. And this deque needs to be shuffled before the sublist is extracted (i.e this is an Experience Replay Buffer for Stochastic Gradient Descent).

Buffer pop and append speed:

times = int(1e4)
tgt = np.array([1, 2, 3, 4])

queue = collections.deque([tgt] * times, maxlen=times)
reg_list = [tgt] * times
numpy_list = np.array([tgt] * times)

def pop():
    queue.pop()
def pop_list():
    reg_list.pop()
def pop_np():
    global numpy_list
    numpy_list = numpy_list[1:]

print(timeit.timeit(pop, number=times))
# 0.0008135469979606569
print(timeit.timeit(pop_list, number=times))
# 0.000994370027910918
print(timeit.timeit(pop_np, number=times))
# 0.0016436030273325741

def push():
    queue.append(tgt)
def push_list():
    reg_list.append(tgt)
numpy_list = np.array([tgt] * 1)
def push_np():
    numpy_list[0] = tgt

print(timeit.timeit(push, number=times))
# 0.0008797429618425667
print(timeit.timeit(push_list, number=times))
# 0.00097957398975268
print(timeit.timeit(push_np, number=times))
# 0.003331452957354486

What is stopping you from indexing the 2D stacked array as if it was a list? — Mad Physicist, Dec 30 '18 at 02:18
"since you can't do *= with lambda": Instead of `lambda x, y: x * y` (which could just be `np.multiply`), try `lambda x, y: np.multiply(x, y, out=x)`. Remember, `*=` returns `self` for reassignment just as `*` does. — Mad Physicist, Dec 30 '18 at 02:20
@MadPhysicist see my clarification for your first comment. But basing off what you said. I guess the I would keep it as a 2D stacked array, shuffle with `np.random.shuffle` and to make it act as a `deque` do `2DArray=2DArray[-min(2Darray.shape[0]:, DEQUE_SIZE):, :]` after each time a new row is appended. — dylan7, Dec 30 '18 at 04:34
Do you strictly need a deque? Do you know how many rows you will end up with in the end? — Mad Physicist, Dec 30 '18 at 04:48
Thing is, you don't need to shuffle the deque, just the indices into it. — Mad Physicist, Dec 30 '18 at 04:50
Well I need a deque since it can be an undefined number of rows (an increasing number of rows until program decides to terminate based on a condition), but only a fixed set of rows should be kept at any given time and rows should be discarded after `N` new rows. — dylan7, Dec 30 '18 at 05:10
In that case, my solution works perfectly. You just need to maintain a fixed size rotating buffer of size N. — Mad Physicist, Dec 30 '18 at 05:49
I've updated my answer with some details you may find interesting/relevant. I'm going to leave the circular buffer implementation up to you. Let me know if you post any related questions. — Mad Physicist, Dec 30 '18 at 15:27

Mad Physicist · Accepted Answer · 2018-12-30T15:25:54.580

Let's break the problem down. You want to have a list of references to arrays that are all mutable, but you want to be able to perform operations on them as a block.

I would maintain that your approach is backwards. Rather than trying to pack and unpack your arrays into separate buffers, maintain views into a single buffer.

Replace your current loop

m = [np.array([1, 2, 4, 5], dtype=np.float64) for _ in range(100)]

with a single buffer, and views into each row:

buf = np.vstack([np.array([1, 2, 4, 5], dtype=np.float64) for _ in range(100)])
m = list(buf)  # check that m[0].base is b

Now you have a list of arrays m, each of which you can modify individually. As long as you keep the modifications in-place and don't reassign list elements, all the changes will appear directly in buf. At the same time, you can perform your batch calculations on buf, and as long as you do them in-place, m will reflect all the changes.

Realistically, you may not even need m. Notice how list(buf) creates views into each row of the chunk. You can just as easily index directly into the buffer. For example, m[3][8] would usually be written in terms of buf as buf[3, 8], but you can also use the fact that buf is a sequence and write buf[3][8]. This is less efficient, since it will create a new view (buf[3]) every time, but not by much.

To extract a shuffled subset of the rows, you can use sequence indexing as well. Let's say your buffer retains the last M rows, which you want to shuffle and extract a subsequence of N rows from. You can do this by creating an array of indices, and shuffling those over and over:

 indices = np.arange(M)

 # inside the loop:

 np.random.shuffle(indices)
 chunk = buf[indices[:N]]
 # do your math on `chunk`

You don't need to reallocate or re-sort indices as long as M doesn't change and you believe that the shuffle is sufficiently random.

So one thing I realized regarding the circular buffer, turning the 2D array into a circular buffer sacrifices the [efficiency of deque](https://www.reddit.com/r/Python/comments/1wehgw/pythons_deque_is_the_stack_object_you_want_to_use/). I've seen some implementations of creating a Numpy circular buffer and [its efficiency](https://stackoverflow.com/a/49800160/4234140). I did some tests (above) comparing pop and append for list, deque and np.array (really setitem since it's a circular buffer), and list and deque don't differ much, but np.array is slow. — dylan7, Dec 30 '18 at 18:44
update: so I did find another implementation of the circular buffer using numpy which claims it is faster than `deque` : https://github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb at notebook entry 63. — dylan7, Dec 30 '18 at 22:02
@dylan. Why are you using append or insert? The block you have is already a circular buffer. You just have to keep track of which row you are writing to and have a flag that triggers once the buffer is full. It's much more efficient than a deque at that point. — Mad Physicist, Dec 31 '18 at 00:03
@dylan. Since the original question wasn't really clear about your purpose, I recommend that you ask another one, showing your research, regarding how to maintain the buffer. There's a lot of scope creep already going on here. — Mad Physicist, Dec 31 '18 at 00:05

efficiency of Hadamard on list of arrays

1 Answers1