list comprehension convolution

Question

I have a working code like this, but it is rather slow.

def halfconvolution(g,w,dz):
    convo=np.zeros_like(g)
    for i in range(0,len(g)):
        sum=0
        for j in range(0,i):
            sum+=g[j]*w[(i-j)]*dz
        convo[i] = -sum
    return convo

I am trying to turn it into a list comprehension, but I am struggling.
I tried:

convo=[-g*w[i-j] for i in g for j in w]

Why are you under the impression that a list comprehension will "be faster"? Don't obfuscate a loop for sake of reducing lines of code — OneCricketeer, Mar 25 '18 at 12:27
You should be using `for i in range(len(g))` and `for j in range(i)` in the comprehension, same as you do in the loop. Don't forget to measure and see if you really get better performance. — orip, Mar 25 '18 at 12:31
@cricket_007 I for some reason thought list comprehensions were faster.. I timed it, and indeed it is slower. Any recommendations on speeding up such a code? — StarStrides, Mar 25 '18 at 12:58
You are using Numpy, but doing loops to return an array... You ought to use *vectorized operations* over entire columns at once rather than indexing the column and updating individual cells — OneCricketeer, Mar 25 '18 at 19:17

Maarten-vd-Sande · Answer 1 · 2018-03-25T21:53:56.670

3

I am not sure if this improves the performance, but it is a list comprehension as you asked

convo = [-sum(g[j] * w[i - j] * dz for j in range(0, i)) for i in range(0, len(g))]

A faster implementation using NumPy:

# make the matrices square
g = np.repeat(g, g.shape[0]).reshape(g.shape[0], g.shape[0], order='F')
w = np.repeat(w, w.shape[0]).reshape(w.shape[0], w.shape[0], order='F')

# take the lower half of g
g = np.tril(g, k=-1)

# shift each column by its index number
# see: https://stackoverflow.com/questions/20360675/roll-rows-of-a-matrix-independently
rows_w, column_indices_w = np.ogrid[:w.shape[0], :w.shape[1]]
shift = np.arange(w.shape[0])
shift[shift < 0] += w.shape[1]
w = w[rows_w, column_indices_w - shift[:,np.newaxis]].T

convo = np.sum(g * w, axis=1) * dz

For it to work it needs both w and g to be of the same size, but otherwise I'm sure a workaround can be found.

I hope this is a more acceptable speedup for you? Always try to rewrite your logic/problem into vector/matrix multiplications.

edited Mar 25 '18 at 21:53

answered Mar 25 '18 at 12:29

Maarten-vd-Sande

3,413
10
27

1

You can loose the [] within the sum function because it accepts a generator function – OneCricketeer Mar 25 '18 at 12:35
This works, thanks. I am surprised but this result is slower than the original.. I was under the impression list comprehensions were faster than for loops? – StarStrides Mar 25 '18 at 12:56
1

In my experience they are often around the same speed. If the code is performance critical it is probably better to rewrite it with NumPy. – Maarten-vd-Sande Mar 25 '18 at 13:19
I added a 'pure' numpy implementation. Take a look if it suits your needs. – Maarten-vd-Sande Mar 25 '18 at 21:54

score 2 · Accepted Answer · answered Mar 25 '18 at 12:32

2

The inner loop can be replaced by the sum function (don't override it with a variable of the same name)

Then you append the outer loop to the end of that

 [-sum(g[j]*w[i-j]*dz for j in range(i)) for i in range(len(g))]

answered Mar 25 '18 at 12:32

OneCricketeer

179,855
19
132
245

max9111 · Answer 3 · 2018-04-04T20:45:10.293

Don't use list comprehensions for performance reasons

Use

Numba
Cython
Vectorized Numpy operations

Numba

import numba as nb
import numpy as np
import time

@nb.njit(fastmath=True)
def halfconvolution(g,w,dz):
    convo=np.empty(g.shape[0],dtype=g.dtype)
    for i in range(g.shape[0]):
        sum=0.
        for j in range(0,i):
            sum+=g[j]*w[(i-j)]*dz
        convo[i] = -sum
    return convo

g=np.random.rand(1000)
w=np.random.rand(1000)
dz=0.15

t1=time.time()
for i in range(1000):
  #res=halfconvolution(g,w,dz)
  res=[-sum(g[j]*w[i-j]*dz for j in range(i)) for i in range(len(g))]

print(time.time()-t1)
print("Done")

Performance

List Comprehension: 0.27s per iteration 
Numba Version: 0.6ms per iteration

So there is a factor 500 between this two versions. If you wan't to call this function on multiple arrays at once, you can also parallelize this problem easily and you should get at least another "Number of Cores" speed up.

list comprehension convolution

3 Answers3

Linked