1

Let's suppose I have two arrays that represent pixels in pictures.

I want to build an array of tensordot products of pixels of a smaller picture with a bigger picture as it "scans" the latter. By "scanning" I mean iteration over rows and columns while creating overlays with the original picture.

For instance, a 2x2 picture can be overlaid on top of 3x3 in four different ways, so I want to produce a four-element array that contains tensordot products of matching pixels.

Tensordot is calculated by multiplying a[i,j] with b[i,j] element-wise and summing the terms.

Please examine this code:

import numpy as np

a = np.array([[0,1,2],
              [3,4,5],
              [6,7,8]])
              
b = np.array([[0,1],
              [2,3]])

shape_diff = (a.shape[0] - b.shape[0] + 1,
              a.shape[1] - b.shape[1] + 1)

def compute_pixel(x,y):
    sub_matrix = a[x : x + b.shape[0],
                   y : y + b.shape[1]]
    return np.tensordot(sub_matrix, b, axes=2)
    
def process():
    arr = np.zeros(shape_diff)
    for i in range(shape_diff[0]):
        for j in range(shape_diff[1]):
            arr[i,j]=compute_pixel(i,j)
    return arr

print(process())

Computing a single pixel is very easy, all I need is the starting location coordinates within a. From there I match the size of the b and do a tensordot product.

However, because I need to do this all over again for each x and y location as I'm iterating over rows and columns I've had to use a loop, which is of course suboptimal.

In the next piece of code I have tried to utilize a handy feature of tensordot, which also accepts tensors as arguments. In order words I can feed an array of arrays for different combinations of a, while keeping the b the same.

Although in order to create an array of said combination, I couldn't think of anything better than using another loop, which kind of sounds silly in this case.

def try_vector():
    tensor = np.zeros(shape_diff + b.shape)
    for i in range(shape_diff[0]):
        for j in range(shape_diff[1]):
            tensor[i,j]=a[i: i + b.shape[0],
                          j: j + b.shape[1]]
    
    return np.tensordot(tensor, b, axes=2)

print(try_vector())

Note: tensor size is the sum of two tuples, which in this case gives (2, 2, 2, 2)

Yet regardless, even if I produced such array, it would be prohibitively large in size to be of any practical use. For doing this for a 1000x1000 picture, could probably consume all the available memory.

So, is there any other ways to avoid loops in this problem?

Anonymous
  • 4,692
  • 8
  • 61
  • 91
  • why does for loop affect memory usage in this case? – Z Li Dec 21 '21 at 18:13
  • The first part of the code, which also uses a loop does not bloat memory because it only accesses the elements it needs, the loop in the second part creates a large array that bloats memory. – Anonymous Dec 21 '21 at 18:18
  • Consider implementing this with the [numba package](https://numba.pydata.org/) – trianta2 Dec 21 '21 at 18:25
  • Well it seems there is no way to avoid for loops. You could consider to speed it up with `numba`. There are ways to create 3D matrix to include all slices as in your second approach but I think it may run into memory issues. – Z Li Dec 21 '21 at 18:44
  • tensorflow may have a fast implementation but this is limited to numpy so.. – Z Li Dec 21 '21 at 18:48
  • Am I missing something, how is this different from correlation (flipped convolution) with b as the kernel? – Andrew Holmgren Dec 21 '21 at 19:00
  • 1
    Ah yes, it's actually called convolve 2d, found solutions here: https://stackoverflow.com/questions/43086557/convolve2d-just-by-using-numpy – Anonymous Dec 21 '21 at 19:15

1 Answers1

0
In [111]: process()
Out[111]: 
array([[19., 25.],
       [37., 43.]])

tensordot with 2 is the same as element multiply and sum:

In [116]: np.tensordot(a[0:2,0:2],b, axes=2)
Out[116]: array(19)
In [126]: (a[0:2,0:2]*b).sum()
Out[126]: 19

A lower-memory way of generating your tensor is:

In [121]: np.lib.stride_tricks.sliding_window_view(a,(2,2))
Out[121]: 
array([[[[0, 1],
         [3, 4]],

        [[1, 2],
         [4, 5]]],


       [[[3, 4],
         [6, 7]],

        [[4, 5],
         [7, 8]]]])

We can do a broadcasted multiply, and sum on the last 2 axes:

In [129]: (Out[121]*b).sum((2,3))
Out[129]: 
array([[19, 25],
       [37, 43]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I figured I could use **np.lib.stride_tricks.as_strided()** that gives pointers to array elements. How is it different from np.lib.stride_tricks.sliding_window_view() and what are the memory overheads? – Anonymous Dec 21 '21 at 20:06
  • The `view` function (did you read the docs?) is just a safer version of as_strided. – hpaulj Dec 21 '21 at 21:53