2

Say I have two arrays, A and B.

An element wise multiplication is defined as follows: enter image description here

I want to do an element-wise multiplication in a convolutional-like manner, i.e., move every column one step right, for example, column 1 will be now column 2 and column 3 will be now column 1. This should yield a ( 2 by 3 by 3 ) array (2x3 matrix for all 3 possibilities)

enter image description here

Jenia Golbstein
  • 374
  • 2
  • 12

4 Answers4

4

We can concatenate A with one of it's own slice and then get those sliding windows. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. Then, multiply those windows with B for the final output. More info on use of as_strided based view_as_windows.

Hence, we will have one vectorized solution like so -

In [70]: from skimage.util.shape import view_as_windows

In [71]: A1 = np.concatenate((A,A[:,:-1]),axis=1)

In [74]: view_as_windows(A1,A.shape)[0]*B
Out[74]: 
array([[[1, 0, 3],
        [0, 0, 6]],

       [[2, 0, 1],
        [0, 0, 4]],

       [[3, 0, 2],
        [0, 0, 5]]])

We can also leverage multi-cores with numexpr module for the final step of broadcasted-multiplication, which should be better on larger arrays. Hence, for the sample case, it would be -

In [53]: import numexpr as ne

In [54]: w = view_as_windows(A1,A.shape)[0]

In [55]: ne.evaluate('w*B')
Out[55]: 
array([[[1, 0, 3],
        [0, 0, 6]],

       [[2, 0, 1],
        [0, 0, 4]],

       [[3, 0, 2],
        [0, 0, 5]]])

Timings on large arrays comparing the proposed two methods -

In [56]: A = np.random.rand(500,500)
    ...: B = np.random.rand(500,500)

In [57]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
    ...: w = view_as_windows(A1,A.shape)[0]

In [58]: %timeit w*B
    ...: %timeit ne.evaluate('w*B')
1 loop, best of 3: 422 ms per loop
1 loop, best of 3: 228 ms per loop

Squeezing out the best off strided-based method

If you really squeeze out the best off the strided-view-based approach, go with the original np.lib.stride_tricks.as_strided based one to avoid the functional overhead off view_as_windows -

def vaw_with_as_strided(A,B):
    A1 = np.concatenate((A,A[:,:-1]),axis=1)
    s0,s1 = A1.strides
    S = (A.shape[1],)+A.shape
    w = np.lib.stride_tricks.as_strided(A1,shape=S,strides=(s1,s0,s1))
    return w*B

Comparing against @Paul Panzer's array-assignment based one, the crossover seems to be at 19x19 shaped arrays -

In [33]: n = 18
    ...: A = np.random.rand(n,n)
    ...: B = np.random.rand(n,n)

In [34]: %timeit vaw_with_as_strided(A,B)
    ...: %timeit pp(A,B)
10000 loops, best of 3: 22.4 µs per loop
10000 loops, best of 3: 21.4 µs per loop

In [35]: n = 19
    ...: A = np.random.rand(n,n)
    ...: B = np.random.rand(n,n)

In [36]: %timeit vaw_with_as_strided(A,B)
    ...: %timeit pp(A,B)
10000 loops, best of 3: 24.5 µs per loop
10000 loops, best of 3: 24.5 µs per loop

So, for anything smaller than 19x19, array-assignment seems to be better and for larger than those, strided-based one should be the way to go.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • `view_as_windows`/`as_strided` seem to have massive constant overhead, so is only worth it for large enough arrays. 30x30 is about where it starts beating my copy-based approach. – Paul Panzer Sep 29 '19 at 05:22
  • @PaulPanzer Not sure what you are getting at. That `view_as_windows`/`as_strided` facilitates 3D view and since the final o/p would be 3D anyway, so that seems like a reasonable/efficient way. And which copy-based approach are we comparing it against? – Divakar Sep 29 '19 at 05:43
  • Made a small post. – Paul Panzer Sep 29 '19 at 06:45
2

Just a note on view_as_windows/as_strided. Neat as these functions are, it is useful to know that they have a rather pronounced constant overhead. Here is comparison between @Divakar's view_as_windows based solution (vaw) and a copy-reshape based approach by me.

enter image description here

As you can see vaw is not very fast on small to medium sized operands and only begins to shine above array size 30x30.

Code:

from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from skimage.util.shape import view_as_windows

B = BenchmarkBuilder()

@B.add_function()
def vaw(A,B):
    A1 = np.concatenate((A,A[:,:-1]),axis=1)
    w = view_as_windows(A1,A.shape)[0]
    return w*B

@B.add_function()
def pp(A,B):
    m,n = A.shape
    aux = np.empty((n,m,2*n),A.dtype)
    AA = np.concatenate([A,A],1)
    aux.reshape(-1)[:-n].reshape(n,-1)[...] = AA.reshape(-1)[:-1]
    return aux[...,:n]*B

@B.add_arguments('array size')
def argument_provider():
    for exp in range(4, 16):
        dim_size = int(1.4**exp)
        a = np.random.rand(dim_size,dim_size)
        b = np.random.rand(dim_size,dim_size)
        yield dim_size, MultiArgument([a,b])

r = B.run()
r.plot()

import pylab
pylab.savefig('vaw.png')
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • Good effort. You were including the functional overhead of `view_as_windows`. Edited with native strided method. Now the crossover is at 19x19. Check out my post for the timings. – Divakar Sep 29 '19 at 09:02
  • @Divakar Whoa, didn't expect such a large difference between `as_strided` and `view_as_windows`. – Paul Panzer Sep 29 '19 at 09:12
1

Run a for loop for the number of columns and use np.roll() around axis =1, to shift your columns and do the matrix multiplication.

refer to the accepted answer in this reference. Hope this helps.

Amit Gupta
  • 2,698
  • 4
  • 24
  • 37
  • Cool, I didn't know about `np.roll` yet! – Energya Sep 28 '19 at 16:02
  • Thanks, this is what I was doing but I wondered if there's a better, more efficient way since I'm working with large matrices – Jenia Golbstein Sep 28 '19 at 16:03
  • I think you should mention about the larger matrix in the question? But irrespective of the matrix dimension, I think this function gives the minimum number of computation required. You can also write your own function for the shift and compare the time consume between these two. – Amit Gupta Sep 28 '19 at 16:04
  • Not sure, I wanted to know if there's a mathematical solution\way to do it, rather than a programming hack – Jenia Golbstein Sep 28 '19 at 16:06
  • @JeniaGolbstein Basic on the current problem description, every pair of elements in both matrices need to be multiplied to get the output so I do not see a better solution in terms of time complexity. A better solution might exist if the actual output is a summary (e.g. the sum of multiplied matrices) or the input matrices have some special properties, e.g. a sparse 0-1 matrix. – GZ0 Sep 28 '19 at 16:14
  • 1
    @JeniaGolbstein effectively in this case... what "func" does there is need to produce N many arrays to multiple be be multiplied... I can't see how you can get better than something like: `m = np.vstack([[np.roll(a, n, 1)] for n in range(a.shape[1])]) * b` here... – Jon Clements Sep 28 '19 at 16:14
  • actually I did write my own, it takes same as numpy.roll rot_img = np.concatenate((img[:, r:], img[:, :r]), axis=1) – Jenia Golbstein Sep 28 '19 at 16:15
0

I can actually pad the array from its two sides with 2 columns (to get 2x5 array) and run a conv2 with 'b' as a kernel, I think it's more efficient

Jenia Golbstein
  • 374
  • 2
  • 12
  • 1
    I would say good thought. But since, there's no sum-reduction, we won't actually be leveraging any performance benefits off those conv-based tools. – Divakar Sep 28 '19 at 16:52