How to speed up this DP function in python with vectorization

Question

So I have this definition here,

DP[i,j] = f[i,j] + min(DP[i−1, j −1], DP[i−1, j], DP[i−1, j +1])

which defines the minimum accrued cost to go from the top of the NxM matrix to the bottom of the matrix. Each cell in f represents a value/cost (1.2, 0, 10, etc.) to travel to that cell from another cell.

The matrix may be large (1500x1500, It's Gradient map of an image), and the DP algorithm I programmed came out to be about a second per run for my matrices. This matrix needs to run hundreds of times per execution, so total program run time comes out to be several minutes long. This loop is about 99% of my bottleneck, so I am trying to optimize this loop with Python/numpys vectorization methods. I only have access to Numpy, and Scipy.

Note: I don't program in python hardly at all, so the solution may just be obvious idk.

First attempt, Just the straightforward loop, time here is about 2-2.5 seconds per run

DP = f.copy()
for r in range(2, len(DP) - 1): # Start at row 2 since row one doesn't change
    for c in range(1, len(DP[0]) - 1):
        DP[r][c] += min(DP[r - 1, c-1:c+2])

Second attempt, I tried to leverage some numpy vectorizations functions "fromiter" to calculate entire rows at a time rather than column by column, time here is about 1-1.5 seconds per run. My goal is to get this at least an order of magnitude faster, but I am stumped on how else I can optimize this.

DP = f.copy()
for r in range(2, len(DP) - 1):
    def foo(arr):
        idx, val = arr
        if idx == 0 or idx == len(DP[[0]) - 1:
            return np.inf
        return val + min(DP[r - 1, idx - 1], DP[r - 1, idx], DP[r - 1, idx + 1])


    DP[r, :] = np.fromiter(map(foo, enumerate(DP[r, :])))

Most of the fast `numpy` methods are 'parallel' in nature, using compiled code to operate on all elements of an array, without any implied order. Looks like your case is sequential in nature, Value at row `r` depending on a window in the previous row. `fromiter` isn't "vectorization", though for some cases it might be faster than other iteration. That said, that `c` loop does look like it could be written in a way that calculates all row values "at once". — hpaulj, Mar 21 '21 at 14:41

score 2 · Accepted Answer · answered Mar 21 '21 at 15:28

As hpaulj stated, being your problem inherently sequential it will be hard to fully vectorize, although it seems possible (every cell is updated based on values of the row r=2, the difference is the considered number of triplets from row 2 for each of the following rows) so perhaps you can find a smart way to do it!

That being said, a quick and half-vectorized solution would be to use the neat way of performing sliding windows with fancy indexing proposed by user42541, so we replace the inner loop with a vectorized call:

indexer = np.arange(3)[:,None] + np.arange(DP.shape[1] - 2)[None,:]
for r in range(2, DP.shape[0] - 1):
    DP[r,1:-1] += np.min(DP[r-1,indexer], axis = 0)

This results in a speed-up relative to your double loop method (your vectorized solution didn't work in my pc) of about two orders of magnitude for a 1500x1500 array of integers.

Thank you! This sped things up by 2 orders of magnitude and is so much faster now. My second solution looks like it didn't actually calculate it rows at a time, but it sped up my code so it must just have sped up the calculations. — Jerm, Mar 21 '21 at 18:35
Glad it was useful! It's always nice to think on well-formulated questions :) — Lith, Mar 21 '21 at 18:50

How to speed up this DP function in python with vectorization

1 Answers1