1

I'm looking for a way to speed up the following code. This code block is a simplified version of an image analysis pipeline I've developed, but the main structure is there. I do need the gaussian filter. Since it's a source that's outside of my control, I've decided to leave it in the example.

In the stack definition, I've included the size of the images I'm using. They are approximately 200 frames whose resolution is 1024, 512.

Is there a way to speed this up with Python?

import numpy as np
import time
from scipy import ndimage as ndi
 
def gauss(stack):
    for i in range(stack.shape[0]):
        stack[i,:,:] -= ndi.gaussian_filter(stack[i, :, :], 10)
    return stack

stack = np.random.randint(0, 20, size=(200,1024,512))
t = time.time()
gs = gauss(stack)
print(f"{time.time() - t:0.3f} seconds")
Knova
  • 15
  • 4

1 Answers1

2

If you do not mind losing a little of "precision" (you may not notice difference), you can use truncate parameter (default is 4.0) of the gaussian filter, for example to truncate to 2 sigmas (at least to 3).

import numpy as np
import time
from scipy import ndimage as ndi


def gauss(stack, truncate):
    for i in range(stack.shape[0]):
        stack[i, :, :] -= ndi.gaussian_filter(stack[i, :, :], 10, truncate=truncate)
    return stack


for truncate in [4, 3, 2]:
    stack = np.random.randint(0, 20, size=(200, 1024, 512))
    t = time.time()
    gs = gauss(stack, truncate=truncate)
    print(f"truncate {truncate}: {time.time() - t:0.3f} seconds")
truncate 4: 8.462 seconds
truncate 3: 6.021 seconds
truncate 2: 4.257 seconds
dankal444
  • 3,172
  • 1
  • 23
  • 35
  • This is great, thank you! I didn't know it existed. Would you recommend any other syntax-related optimizations? Is there a faster way to loop through each frame in the stack? – Knova Jan 04 '23 at 16:05
  • 1
    @Knova sorry nothing else similar to this. But I can suggest going for `multiprocessing`. – dankal444 Jan 04 '23 at 22:24
  • 1
    You can try to compute this more efficiently using a FFT. FFTs are generally pretty fast for large kernels. Libraries like the FFTW are known to be fast for that and they provide a parallel implementation. – Jérôme Richard Jan 06 '23 at 20:09
  • @JérômeRichard, would you be interested in writing an example for this? I'm curious to see your solution. – Knova Jan 07 '23 at 05:35
  • 1
    @Knova This is a bit tedious to do, but there are scipy functions doing it (see [here](https://stackoverflow.com/questions/1100100/fft-based-2d-convolution-and-correlation-in-python)). IDK if `ndi.gaussian_filter` uses this methods, but probably not. AFAIK Scipy has its own FFT implementation that is certainly not as fast as the FFTW (and likely not parallel) but simpler. Besides, gaussian filter can be aproximated using multiple 1D filter (vertical+horizontal ones). See [this](https://en.wikipedia.org/wiki/Separable_filter) for more information. – Jérôme Richard Jan 07 '23 at 06:10
  • 2
    @JérômeRichard "gaussian filter can be aproximated using multiple 1D filter". Thats what scipy does. And 1D filters are made using simple correlation method. – dankal444 Jan 07 '23 at 21:26
  • 2
    @dankal444 Interesting. It make sense regarding the variation of the speed based on the changing parameter in your benchmark. However, the performance of the operation looks really bad to me. Maybe it is because of the speed of the simple correlation method. Maybe it does not use SIMD code nor multiple thread. I think it should be about 10 times faster on a mainstream (relatively recent) PC. – Jérôme Richard Jan 07 '23 at 23:51