Applying matrix-to-scalar function to a large matrix

Question

Assume a function M2S which takes a (small) matrix as an input and outputs a scalar, and applying M2S to each block of a (large) matrix. I would like to have an efficient solution with NumPy.

A solution I tested is:

import numpy as np
img_depth=np.random.randint(0,5,(480,640))
w_block=10

def M2S(img_block):
  img_valid=img_block[img_block!=0]
  if len(img_valid)==0:    return None
  return np.mean(img_valid)
out1=[[uu+w_block/2, vv+w_block/2, M2S(img_depth[vv:vv+w_block,uu:uu+w_block])]
       for vv in range(0,img_depth.shape[0],w_block)
       for uu in range(0,img_depth.shape[1],w_block) ]

(Note that I serialized the matrix to have a list of [column of block, row of block, M2S output] for my necessity.)

This computation took 43ms in my environment.

I also tried vectorization, but there was no speed improvement:

indices= np.indices(img_depth.shape)[:,0:img_depth.shape[0]:w_block,0:img_depth.shape[1]:w_block]
vM2S= np.vectorize(lambda v,u: M2S(img_depth[v:v+w_block,u:u+w_block]))
out2= np.dstack((indices[1]+w_block/2, indices[0]+w_block/2, vf_feat(indices[0], indices[1]))).reshape(-1,3)

Is there any better ideas to improve the computation speed?

(Edited)

I draw a sketch to explain the above process:

The second step (serialize) can be computed quickly, so my question is mainly about the first step.

How much of the `vectorize` docs did you read? – hpaulj Mar 24 '21 at 04:32 — hpaulj, Mar 24 '21 at 04:32
`vectorize` does not vectorize... – Mad Physicist Mar 25 '21 at 16:30 — Mad Physicist, Mar 25 '21 at 16:30

Akihiko · Accepted Answer · 2021-03-26T02:06:42.460

Using the as_strided function of NumPy looks the best approach [1],[2].

First, let's practice as_strided with a small matrix.

M=np.array([[ 1,  2,  3,  4],
            [ 5,  6,  7,  8],
            [ 9, 10, 11, 12],
            [13, 14, 15, 16]])
# M.shape:(2,2), M.strides=(16, 4)

Let us consider a 2x2 block, and handle M as a (2,2,2,2) tensor. In this case, its stride is (32,8,16,4).

subM=np.lib.stride_tricks.as_strided(M,(2,2,2,2),(32,8,16,4))

subM= [[[[ 1  2]
   [ 5  6]]

  [[ 3  4]
   [ 7  8]]]


 [[[ 9 10]
   [13 14]]

  [[11 12]
   [15 16]]]]

Then we make NumPy operations for this tensor equivalent to M2S. From the definition above, M2S is the mean of non-zero elements:

np.sum(subM,axis=(2,3)) / np.count_nonzero(subM,axis=(2,3))

array([[ 3.5,  5.5],
       [11.5, 13.5]])

Finally, let us generalize this approach to solve the original task.

import numpy as np
img_depth=np.random.randint(0,5,(480,640))
w_block=10

view_shape=(int(img_depth.shape[0]/w_block),int(img_depth.shape[1]/w_block),w_block,w_block)
strides=(img_depth.strides[0]*w_block,img_depth.strides[1]*w_block,img_depth.strides[0],img_depth.strides[1])
img_depth2=np.lib.stride_tricks.as_strided(img_depth,view_shape,strides)
out3_0=np.sum(img_depth2,axis=(2,3)) / np.count_nonzero(img_depth2,axis=(2,3))

indices= np.indices(img_depth.shape)[:,0:img_depth.shape[0]:w_block,0:img_depth.shape[1]:w_block]
out3= np.dstack((indices[1]+w_block/2, indices[0]+w_block/2, out3_0)).reshape(-1,3)

This approach improved much the computation speed (43 ms to 1.7 ms).

(edited)

As @mad-physicist pointed out, we can create the tensor with the reshape method:

view_shape=(int(img_depth.shape[0]/w_block),w_block,int(img_depth.shape[1]/w_block),w_block)
img_depth2=img_depth.reshape(view_shape).swapaxes(1, 2)

Note that view_shape here is different from view_shape above (indexes 1 and 2 are swapped).

This results the same, and probably easier to understand since we do not need to explicitly define the strides. One concern may be a memory duplication, but I did not see much increase of computation time.

`as_strided` is rarely the best approach, but probably a good one. A reshape + transpose would copy data here, while `as_strided` would not. — Mad Physicist, Mar 25 '21 at 16:33
@mad-physicist Thanks! I coded the `reshape` approach and wrote it in the answer. — Akihiko, Mar 26 '21 at 02:08

Michael · Answer 2 · 2021-03-25T21:51:43.180

You can use fancy indexing together with list comprehension technique, as in the following code:

import numpy as np

img_depth = np.random.randint(0, 5, (5, 5))
w_block = 2

def M2S(img_block):
    img_valid = img_block[img_block != 0]
    if len(img_valid) == 0:
        return None
    return np.mean(img_valid)
]

u = np.arange(0, img_depth.shape[0], w_block)
v = np.arange(0, img_depth.shape[1], w_block)

grid_list = [[x, y] for x in u for y in v]

vals = list(map(M2S, list(map(lambda coords: img_depth[coords[0]:coords[0]+w_block, coords[1]:coords[1]+w_block], grid_list))))

res = [[(data_tup[0][0]+w_block)/2, (data_tup[0][1]+w_block)/2, data_tup[1]] for data_tup in zip(grid_list, vals)]

Output

[[1.0, 1.0, 1.75], [1.0, 2.0, 3.6666666666666665], [1.0, 3.0, 2.0], [2.0, 1.0, 2.6666666666666665], [2.0, 2.0, 3.5], [2.0, 3.0, 2.0], [3.0, 1.0, 2.0], [3.0, 2.0, 2.5], [3.0, 3.0, 2.0]]

The main part here is mapping a lambda which divides the image into sub-images, and then mapping your M2S function on each of the images.

Then the last line takes care of the final result by list-comprehension

Cheers.

I tested this and found that the computation speed is as same as the original code. — Akihiko, Mar 25 '21 at 15:26

Applying matrix-to-scalar function to a large matrix

2 Answers2