5

I'm trying to implement convolutional neural network in Python.
However, when I use signal.convolve or np.convolve, it can not do convolution on X, Y(X is 3d, Y is 2d). X are training minibatches. Y are filters. I don't want to do for loop for every training vector like:

for i in xrange(X.shape[2]):
    result = signal.convolve(X[:,:,i], Y, 'valid')
    ....

So, is there any function I can use to do convolution efficiently?

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
Vito
  • 426
  • 3
  • 11

1 Answers1

5

Scipy implements standard N-dimensional convolutions, so that the matrix to be convolved and the kernel are both N-dimensional.

A quick fix would be to add an extra dimension to Y so that Y is 3-Dimensional:

result = signal.convolve(X, Y[..., None], 'valid')

I'm assuming here that the last axis corresponds to the image index as in your example [width, height, image_idx] (or [height, width, image_idx]). If it is the other way around and the images are indexed in the first axis (as it is more common in C-ordering arrays) you should replace Y[..., None] with Y[None, ...].

The line Y[..., None] will add an extra axis to Y, making it 3-dimensional [kernel_width, kernel_height, 1] and thus, converting it to a valid 3-Dimensional convolution kernel.

NOTE: This assumes that all your input mini-batches have the same width x height, which is standard in CNN's.


EDIT: Some timings as @Divakar suggested.

The testing framework is setup as follows:

def test(S, N, K):
    """ S: image size, N: num images, K: kernel size"""
    a = np.random.randn(S, S, N)
    b = np.random.randn(K, K)
    valid = [slice(K//2, -K//2+1), slice(K//2, -K//2+1)]

    %timeit signal.convolve(a, b[..., None], 'valid')
    %timeit signal.fftconvolve(a, b[..., None], 'valid')
    %timeit ndimage.convolve(a, b[..., None])[valid]

Find bellow tests for different configurations:

  • Varying image size S:

    >>> test(100, 50, 11) # 100x100 images
    1 loop, best of 3: 909 ms per loop
    10 loops, best of 3: 116 ms per loop
    10 loops, best of 3: 54.9 ms per loop
    
    >>> test(1000, 50, 11) # 1000x1000 images
    1 loop, best of 3: 1min 51s per loop
    1 loop, best of 3: 16.5 s per loop
    1 loop, best of 3: 5.66 s per loop
    
  • Varying number of images N:

    >>> test(100, 5, 11) # 5 images
    10 loops, best of 3: 90.7 ms per loop
    10 loops, best of 3: 26.7 ms per loop
    100 loops, best of 3: 5.7 ms per loop
    
    >>> test(100, 500, 11) # 500 images
    1 loop, best of 3: 9.75 s per loop
    1 loop, best of 3: 888 ms per loop
    1 loop, best of 3: 727 ms per loop
    
  • Varying kernel size K:

    >>> test(100, 50, 5) # 5x5 kernels
    1 loop, best of 3: 217 ms per loop
    10 loops, best of 3: 100 ms per loop
    100 loops, best of 3: 11.4 ms per loop
    
    >>> test(100, 50, 31) # 31x31 kernels
    1 loop, best of 3: 4.39 s per loop
    1 loop, best of 3: 220 ms per loop
    1 loop, best of 3: 560 ms per loop
    

So, in short, ndimage.convolve is always faster, except when the kernel size is very large (as K = 31 in the last test).

Imanol Luengo
  • 15,366
  • 2
  • 49
  • 67
  • Good find! For a faster version use `ndimage.filters.convolve(X,Y[...,None])` maybe with some slicing? – Divakar Aug 30 '16 at 12:15
  • @Divakar it would be nice to time it (I'll do it if I can get some time), but I think that `ndimage,filters.convolve` should be slower for large kernel sizes (as it does convolution in the spatial domain whereas `signal.convolve` does it in the fourier domain). The advantage of the `ndimage` filters are that support different boundary conditions (different from fourier which I think that by definition is equivalent to padding the image with 0s). I'll do the timings asap! – Imanol Luengo Aug 30 '16 at 12:20
  • @Divakar tests done! I'm feeling like this is some kind of dejavu. As you suggested, `ndimage.convolve` tends to be faster in general. I remember having a similar conversation about this in a different thread haha – Imanol Luengo Aug 30 '16 at 13:21
  • You know what that's good memory and not just dejavu! ;) Here's the post I think you were referring to : http://stackoverflow.com/a/38232755/3293881 – Divakar Aug 30 '16 at 13:25
  • @Divakar Haha exactly that one yes! That time I was proposing `uniform_filter` over `signal.convolve` lol. Although `uniform_filter` was faster due to the *separability* of the kernels, the above results seem to confirm the conclusion we arrived in that thread: for large kernel sizes fourier (fftconvolve) > spatial, if not spatial > fourier. – Imanol Luengo Aug 30 '16 at 13:31
  • Yup! That theory prevails here! :D – Divakar Aug 30 '16 at 13:31