Matrix element wise multiplication with shifted columns

Question

Say I have two arrays, A and B.

An element wise multiplication is defined as follows:

I want to do an element-wise multiplication in a convolutional-like manner, i.e., move every column one step right, for example, column 1 will be now column 2 and column 3 will be now column 1. This should yield a ( 2 by 3 by 3 ) array (2x3 matrix for all 3 possibilities)

Did either of the posted solutions work for you? – Divakar Sep 30 '19 at 17:39 — Divakar, Sep 30 '19 at 17:39

Divakar · Answer 1 · 2019-09-29T09:01:42.667

We can concatenate A with one of it's own slice and then get those sliding windows. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. Then, multiply those windows with B for the final output. More info on use of as_strided based view_as_windows.

Hence, we will have one vectorized solution like so -

In [70]: from skimage.util.shape import view_as_windows

In [71]: A1 = np.concatenate((A,A[:,:-1]),axis=1)

In [74]: view_as_windows(A1,A.shape)[0]*B
Out[74]: 
array([[[1, 0, 3],
        [0, 0, 6]],

       [[2, 0, 1],
        [0, 0, 4]],

       [[3, 0, 2],
        [0, 0, 5]]])

We can also leverage multi-cores with numexpr module for the final step of broadcasted-multiplication, which should be better on larger arrays. Hence, for the sample case, it would be -

In [53]: import numexpr as ne

In [54]: w = view_as_windows(A1,A.shape)[0]

In [55]: ne.evaluate('w*B')
Out[55]: 
array([[[1, 0, 3],
        [0, 0, 6]],

       [[2, 0, 1],
        [0, 0, 4]],

       [[3, 0, 2],
        [0, 0, 5]]])

Timings on large arrays comparing the proposed two methods -

In [56]: A = np.random.rand(500,500)
    ...: B = np.random.rand(500,500)

In [57]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
    ...: w = view_as_windows(A1,A.shape)[0]

In [58]: %timeit w*B
    ...: %timeit ne.evaluate('w*B')
1 loop, best of 3: 422 ms per loop
1 loop, best of 3: 228 ms per loop

Squeezing out the best off strided-based method

If you really squeeze out the best off the strided-view-based approach, go with the original np.lib.stride_tricks.as_strided based one to avoid the functional overhead off view_as_windows -

def vaw_with_as_strided(A,B):
    A1 = np.concatenate((A,A[:,:-1]),axis=1)
    s0,s1 = A1.strides
    S = (A.shape[1],)+A.shape
    w = np.lib.stride_tricks.as_strided(A1,shape=S,strides=(s1,s0,s1))
    return w*B

Comparing against @Paul Panzer's array-assignment based one, the crossover seems to be at 19x19 shaped arrays -

In [33]: n = 18
    ...: A = np.random.rand(n,n)
    ...: B = np.random.rand(n,n)

In [34]: %timeit vaw_with_as_strided(A,B)
    ...: %timeit pp(A,B)
10000 loops, best of 3: 22.4 µs per loop
10000 loops, best of 3: 21.4 µs per loop

In [35]: n = 19
    ...: A = np.random.rand(n,n)
    ...: B = np.random.rand(n,n)

In [36]: %timeit vaw_with_as_strided(A,B)
    ...: %timeit pp(A,B)
10000 loops, best of 3: 24.5 µs per loop
10000 loops, best of 3: 24.5 µs per loop

So, for anything smaller than 19x19, array-assignment seems to be better and for larger than those, strided-based one should be the way to go.

`view_as_windows`/`as_strided` seem to have massive constant overhead, so is only worth it for large enough arrays. 30x30 is about where it starts beating my copy-based approach. — Paul Panzer, Sep 29 '19 at 05:22
@PaulPanzer Not sure what you are getting at. That `view_as_windows`/`as_strided` facilitates 3D view and since the final o/p would be 3D anyway, so that seems like a reasonable/efficient way. And which copy-based approach are we comparing it against? — Divakar, Sep 29 '19 at 05:43

score 2 · Answer 2 · answered Sep 29 '19 at 06:43

Just a note on view_as_windows/as_strided. Neat as these functions are, it is useful to know that they have a rather pronounced constant overhead. Here is comparison between @Divakar's view_as_windows based solution (vaw) and a copy-reshape based approach by me.

As you can see vaw is not very fast on small to medium sized operands and only begins to shine above array size 30x30.

Code:

from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from skimage.util.shape import view_as_windows

B = BenchmarkBuilder()

@B.add_function()
def vaw(A,B):
    A1 = np.concatenate((A,A[:,:-1]),axis=1)
    w = view_as_windows(A1,A.shape)[0]
    return w*B

@B.add_function()
def pp(A,B):
    m,n = A.shape
    aux = np.empty((n,m,2*n),A.dtype)
    AA = np.concatenate([A,A],1)
    aux.reshape(-1)[:-n].reshape(n,-1)[...] = AA.reshape(-1)[:-1]
    return aux[...,:n]*B

@B.add_arguments('array size')
def argument_provider():
    for exp in range(4, 16):
        dim_size = int(1.4**exp)
        a = np.random.rand(dim_size,dim_size)
        b = np.random.rand(dim_size,dim_size)
        yield dim_size, MultiArgument([a,b])

r = B.run()
r.plot()

import pylab
pylab.savefig('vaw.png')

Good effort. You were including the functional overhead of `view_as_windows`. Edited with native strided method. Now the crossover is at 19x19. Check out my post for the timings. — Divakar, Sep 29 '19 at 09:02
@Divakar Whoa, didn't expect such a large difference between `as_strided` and `view_as_windows`. — Paul Panzer, Sep 29 '19 at 09:12

score 1 · Answer 3 · answered Sep 28 '19 at 16:01

1

Run a for loop for the number of columns and use np.roll() around axis =1, to shift your columns and do the matrix multiplication.

refer to the accepted answer in this reference. Hope this helps.

answered Sep 28 '19 at 16:01

Amit Gupta

2,698
4
24
37

Cool, I didn't know about `np.roll` yet! – Energya Sep 28 '19 at 16:02
Thanks, this is what I was doing but I wondered if there's a better, more efficient way since I'm working with large matrices – Jenia Golbstein Sep 28 '19 at 16:03
I think you should mention about the larger matrix in the question? But irrespective of the matrix dimension, I think this function gives the minimum number of computation required. You can also write your own function for the shift and compare the time consume between these two. – Amit Gupta Sep 28 '19 at 16:04
Not sure, I wanted to know if there's a mathematical solution\way to do it, rather than a programming hack – Jenia Golbstein Sep 28 '19 at 16:06
@JeniaGolbstein Basic on the current problem description, every pair of elements in both matrices need to be multiplied to get the output so I do not see a better solution in terms of time complexity. A better solution might exist if the actual output is a summary (e.g. the sum of multiplied matrices) or the input matrices have some special properties, e.g. a sparse 0-1 matrix. – GZ0 Sep 28 '19 at 16:14
1

@JeniaGolbstein effectively in this case... what "func" does there is need to produce N many arrays to multiple be be multiplied... I can't see how you can get better than something like: `m = np.vstack([[np.roll(a, n, 1)] for n in range(a.shape[1])]) * b` here... – Jon Clements Sep 28 '19 at 16:14
actually I did write my own, it takes same as numpy.roll rot_img = np.concatenate((img[:, r:], img[:, :r]), axis=1) – Jenia Golbstein Sep 28 '19 at 16:15

score 0 · Answer 4 · answered Sep 28 '19 at 16:29

0

I can actually pad the array from its two sides with 2 columns (to get 2x5 array) and run a conv2 with 'b' as a kernel, I think it's more efficient

answered Sep 28 '19 at 16:29

Jenia Golbstein

374
2
12

1

I would say good thought. But since, there's no sum-reduction, we won't actually be leveraging any performance benefits off those conv-based tools. – Divakar Sep 28 '19 at 16:52

Matrix element wise multiplication with shifted columns

4 Answers4