How to speed up differential_evolution to find the minimum of Sum of Squared Errors with five variables

Question

I am trying to find the right scaling, angle and translation to align two images with different resolution, using differential_evolution (shown in the Code below), which works, but the more pixel I want to compare, the longer the runtime of my Code is.

My approach is the following in detail: I am using an affine transformation with the parameters sc1,sc2 (scaling factors),al(angle),u1 (horizonal translation) and u2 (vertical translation) on the coordinates of the higher resoluted images (coora2). I now want to find the minimum of the Sum of Squared Differences. The Code below is a simplified version of my real Data.

The approach works for small values for w1, h1 and w2,h2. My problem is, that the more data I have (increase w1, h1 or w2, h2) the runtime gets very, very long. For my original data I have w1=338, h1=514 and w2=3610 and h2 =5190 and the program is running for days now. I would be really thankful to get any kind of advice to speed up the Code to find my right parameters. The simplified code is the following:

from math import floor
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import minimize
from scipy.optimize import Bounds
from scipy.optimize import differential_evolution

w1=8 #A1
h1=5
w2=25 #A2
h2=14
hw1=w1*h1
hw2=w2*h2
A1=np.ones((h1,w1))
A2=np.ones((h2,w2))

for n1 in np.arange(2,4):
    A1[1][n1]=1000

for n2 in np.arange(5,9):
    A2[6][n2]=1000
    A2[7][n2]=1000

#Norm the raw data
maxA1=np.max(A1)
A1=1/maxA1*A1
#Norm the raw data
maxA2=np.max(A2)
A2=1/maxA2*A2
   
fig, axes = plt.subplots(1, 2)
axes[0].imshow(A2)
axes[0].set_title('original-A2')
axes[1].imshow(A1)
axes[1].set_title('shifted-A1')
plt.show()

def xy_to_pixelvalue(x,y,dataArray): #getting the values to the normed pixels
    height=dataArray.shape[0]
    width=dataArray.shape[1]
    xcoord=floor((x+1)/2*(width-1)) 
    ycoord=floor((y+1)/2*(height-1))
    if -1 <=x<=1 and -1<=y<=1:
        return dataArray[ycoord][xcoord]
    else:
        return(0)

#norming pixel coordinates
A1x=np.linspace(-1,1,w1)
A1y=np.linspace(-1,1,h1)

A2x=np.linspace(-1,1,w2)
A2y=np.linspace(-1,1,h2)

#normed coordinates of A2 in a matrix
Ap2=np.zeros((h2,w2,2))
for i in np.arange(0,h2):
    for j in np.arange(0,w2):
     Ap2[i][j]=(A2x[j],A2y[i])
#defining a vector with the coordinates of A2
d=[]
cdata=Ap2[:,:,0:2]
for i in np.arange(0,h2):
    for j in np.arange(0,w2):
        d.append((cdata[i][j][0],cdata[i][j][1]))
d=np.asarray(d)
coora2=d.transpose()
coora2=np.array(coora2, dtype=np.float64)

def tempsum3(x):
    tempsum=0
    for l in np.arange(0,hw2):
     tempsum += (xy_to_pixelvalue(np.cos(np.radians(x[2]))*x[0]*coora2[0][l]-np.sin(np.radians(x[2]))*x[1]*coora2[1][l]+x[3],np.sin(np.radians(x[2]))*x[0]*coora2[0][l]+np.cos(np.radians(x[2]))*x[1]*coora2[1][l]+x[4],A1)-xy_to_pixelvalue(coora2[0][l],coora2[1][l],A2))**2
    return tempsum

x0 = np.array([1,1,-0.5,0,0])
bounds = Bounds([0.1,0.1,-1,-4,-4], [5,5,2,4,4])
rslt = differential_evolution(tempsum3, bounds=bounds, popsize= 5*15, x0=x0)
print(rslt)

score 0 · Answer 1 · answered May 19 '22 at 18:37

The default implementation of Python is the CPython interpreter. CPython does not optimize mathematical expression. Additionally, Numpy has a huge overhead for computing scalar values (due to many check like type/value checks). One optimization is to pre-compute some expressions in tempsum3. That being said, a better approach is to use a compiler like Numba (just-in-time compiler) or Cython (static compiler). Numba can optimize Numpy calls to remove most of the overhead and also the repeated computation of the same expressions. Here is an example:

import numba as nb

# [...]

@nb.njit
def xy_to_pixelvalue(x,y,dataArray):
    # [...]

@nb.njit('(float64[::1],)')
def tempsum3(x):
    # [...]

# [...]

This is at least 100 times faster on my machine. Keep in mind that interpreters like CPython are not made for intensive computing. An alternative solution is to use vectorized Numpy function but they are not great for small arrays (it is still far better than scalar computation).

Thank you so much for this advice. Since I am quite new on Python I had no idea how to find an easy approach to solve my problem. This is very simple and worked excellently (instead of 270718 seconds the programme now just takes 1384 seconds). Thank you! — van25tam, May 23 '22 at 07:32

score 0 · Answer 2 · answered May 20 '22 at 05:26

@Jérôme already mentioned a Numba or Cython approach, so I'll address a vectorized numpy version.

Firstly, let's time your tempsum3 function on my machine for a fair comparison:

In [60]: %timeit tempsum3(x0)
24.6 ms ± 192 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

One of the main python performance bottlenecks is function calls. This motivates us to do as much work as possible per function call, i.e. we want to vectorize the functions xy_to_pixelvalue(). For this purpose, xand y are no longer scalars but np.ndarrays. Then, we can use np.where() as vectorized replacement of your if-else. Additionally, we can replace the loops inside tempsum3 in the same vein by numpy ufuncs.

def xy_to_pixelvalue(x, y, dataArray):
    height = dataArray.shape[0]
    width = dataArray.shape[1]
    xcoords = np.array(np.floor((x+1)/2*(width-1)), dtype=np.int64) 
    ycoords = np.array(np.floor((y+1)/2*(height-1)), dtype=np.int64)
    return np.where((-1 <= x) & (x <= 1) & (-1 <= y) & (y <= 1), dataArray[ycoords, xcoords], 0)

def tempsum3(x):
    xx = np.cos(np.radians(x[2]))*x[0]*coora2[0,:] - np.sin(np.radians(x[2]))*x[1]*coora2[1,:] + x[3]
    yy = np.sin(np.radians(x[2]))*x[0]*coora2[0,:] + np.cos(np.radians(x[2]))*x[1]*coora2[1,:] + x[4]
    temp = xy_to_pixelvalue(xx, yy, A1) - xy_to_pixelvalue(coora2[0,:], coora2[1,:], A2)
    return np.sum(temp**2)

Timing again on my machine yields

In [62]: %timeit tempsum3(x0)
315 µs ± 3.29 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Consequently, we are nearly 80x faster.

This approach is also very easy to understand and worked well! Thank you for the alternative opportunity. — van25tam, May 23 '22 at 07:36

How to speed up differential_evolution to find the minimum of Sum of Squared Errors with five variables

2 Answers2

Linked