Efficient vectorized version of this numpy for loop

Question

Short intro
I have two paired lists of 2D numpy arrays (see below) - paired in the sense that index 0 in array1 corresponds to index 0 in array2. For each of the pairs I want to get all the combinations of all rows in the 2D numpy arrays, like answered by Divakar here.

Array example

arr1 = [
 np.vstack([[1,6,3,9], [8,5,6,7]]),
 np.vstack([[1,6,3,9]]),
 np.vstack([[1,6,3,9], [8,5,6,7],[8,5,6,7]])
]

arr2 = [
 np.vstack([[8,8,8,8]]),
 np.vstack([[8,8,8,8]]),
 np.vstack([[1,6,3,9], [8,5,6,7],[8,5,6,7]])
]

Working code

Note, unlike the linked answer my columns are fixed (always 4) hence I replaced using shape by the hardcode value 4 (or 8 in np.zeros).

def merge(a1, a2):
 # From: https://stackoverflow.com/questions/47143712/combination-of-all-rows-in-two-numpy-arrays
 m1 = a1.shape[0]
 m2 = a2.shape[0]
 out = np.zeros((m1, m2, 8), dtype=int)
 out[:, :, :4] = a1[:, None, :]
 out[:, :, 4:] = a2
 out.shape = (m1 * m2, -1)
 return out

total = np.concatenate([merge(arr1[i], arr2[i]) for i in range(len(arr1))])
print(total)

Question

While the above works fine, it looks inefficient to me as it:

involves looping through the arrays
"appends" (in list list comprehsion) to the total array, requiring it to allocate memory each time
creates multiple zero arrays (in the merge function), whereas I could create an empty one at the start? related to the point above

I perform this operation thousands of times on arrays with millions of elements, so any suggestions on how to transform this code into something more efficient?

Use `np.array([[1,2,3,4]])` instead of `vstack`. That's the more conventional way of defining an array. And for those of us who can't run your code mentally, or are too lazy to copy to our own computer, show some results. For example one or more of the `merge` calls, as well as the `total`. — hpaulj, Jan 04 '22 at 17:38
I doubt a much faster code exists assuming you want to work on such a small array. Numpy is not designed to be efficient on very small array. Moreover, lists of array almost require quite-slow python loops (either explicit or implicit ones). Having a better formatted input could help significantly. — Jérôme Richard, Jan 04 '22 at 20:20

score 1 · Answer 1 · answered Jan 04 '22 at 15:57

To be honest, this seems pretty hard to optimize. Each step in the loop has a different size, so likely there isn't any purely vectorized way of doing these things. You can try pre-allocating the memory and writing in place, rather than allocating many pieces and finally concatenating the results, but I'd bet that doesn't help you much (unless you are under such restrained conditions that you don't have enough RAM to store everything twice, of course).

Feel free to try the following approach on your larger data, but I'd be surprised if you get any significant speedup (or even that you don't get slower results!).

# Use scalar product to get the final size
result = np.zeros((np.dot([len(x) for x in arr1], [len(x) for x in arr2]), 8), dtype=int)
start = 0
for a1, a2 in zip(arr1, arr2):
    end = start + len(a1) * len(a2)
    result[start:end, :4] = np.repeat(a1, len(a2), axis=0)
    result[start:end, 4:] = np.tile(a2, (len(a1), 1))
    start = end

hpaulj · Answer 2 · 2022-01-04T22:03:33.940

This is what I wanted to see - the list and the merge results:

In [60]: arr1
Out[60]: 
[array([[1, 6, 3, 9],
        [8, 5, 6, 7]]),
 array([[1, 6, 3, 9]]),
 array([[1, 6, 3, 9],
        [8, 5, 6, 7],
        [8, 5, 6, 7]])]
In [61]: arr2
Out[61]: 
[array([[8, 8, 8, 8]]),
 array([[8, 8, 8, 8]]),
 array([[1, 6, 3, 9],
        [8, 5, 6, 7],
        [8, 5, 6, 7]])]

In [63]: merge(arr1[0],arr2[0])     # a (2,4) with (1,4) => (2,8)
Out[63]: 
array([[1, 6, 3, 9, 8, 8, 8, 8],
       [8, 5, 6, 7, 8, 8, 8, 8]])
In [64]: merge(arr1[1],arr2[1])    # a (1,4) with (1,4) => (1,8)
Out[64]: array([[1, 6, 3, 9, 8, 8, 8, 8]])
In [65]: merge(arr1[2],arr2[2])     # a (3,4) with (3,4) => (9,8)
Out[65]: 
array([[1, 6, 3, 9, 1, 6, 3, 9],
       [1, 6, 3, 9, 8, 5, 6, 7],
       [1, 6, 3, 9, 8, 5, 6, 7],
       [8, 5, 6, 7, 1, 6, 3, 9],
       [8, 5, 6, 7, 8, 5, 6, 7],
       [8, 5, 6, 7, 8, 5, 6, 7],
       [8, 5, 6, 7, 1, 6, 3, 9],
       [8, 5, 6, 7, 8, 5, 6, 7],
       [8, 5, 6, 7, 8, 5, 6, 7]])

And total is (12,8), combing all "rows".

The list comprehension is, more cleanly stated:

[merge(a,b) for a,b in zip(arr1,arr2)]

The lists, while the same length, have arrays with different numbers of rows, and the merge is also different.

People often ask about making an array iteratively, and we consistently say, collect the results in a list, and do one concatenate (like) construction at the end. The equivalent loop is:

In [70]: alist = []
    ...: for a,b in zip(arr1,arr2):
    ...:     alist.append(merge(a,b))

This is usually competitive with predefining the total array, and assigning rows. And in your case to get the final shape of total you'd have to iterate through the lists and record the number of rows, etc.

Unless the computation is trivial, the iteration mechanism is a minor part of the total time. I'm pretty sure that here, it's calling merge 3 times that's taking most of the time. For a task like this I wouldn't worry too much about memory use, including the creation of the zeros. You have to, in one way or other use memory for a (12,8) final result. Building that from a (2,8),(1,8), and (9,8) isn't a big issue.

The list comprehension with concatenate and without:

In [72]: timeit total = np.concatenate([merge(a,b) for a,b in zip(arr1,arr2)])
22.4 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [73]: timeit [merge(a,b) for a,b in zip(arr1,arr2)]
16.3 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Calling merge 3 times with any of the pairs takes about the same time.

Oh, another thing, don't try to 'reuse' the out array across merge calls. When accumulating results like this in a list, reuse of the arrays is dangerous. Each merge call must return its own array, not a "recycled" one.

Efficient vectorized version of this numpy for loop

2 Answers2