Alternative to for-loop for large dataset to improve computational speed

Question

For kriging, I need to compute large mesh arrays of length 20000. The code below works fine, especially for small mesh length (< 100), however, the computational time for such large mesh is very long (approx 45min). The length of data ranges between 300 and 800. Below is a working code:

import numpy as np
from scipy.spatial.distance import pdist, squareform    

icovfct = [36, 6524.62, 1383.13, 2]
imesh = np.array([[632230, 632090, 632110, 632130, 632150, 632170, 632190, 632210, 632230, 632250, 632270, 632290, 632310, 632070], 
                  [3045160, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045200]], np.float64)
imesh = imesh.T
idata = np.array([[634026.049, 633901.182, 634001.365, 634007.132, 633893.706, 633802.327, 634144.246, 634015.993, 633897.326, 633779.479],
                  [3048117.579, 3048201.031, 3048191.922, 3047891.355, 3047994.462, 3048084.562, 3047633.421, 3047719.845, 3047818.914, 3047902.179],
                  [256.550, 236.317, 249.458, 281.889, 262.321, 239.495, 303.144, 295.319, 281.270, 261.083]], np.float64)
idata = idata.T

def ordinary(covfct, data, mesh):
    prediction = []              
    for i, dummy_val in enumerate(mesh):
        # distance between data and each data point in mesh
        d = np.sqrt((data[:, 0]-mesh[i, 0])**2.0 + (data[:, 1]-mesh[i, 1])**2.0)

        # add these distances to P
        P = np.vstack((data.T, d)).T

        # apply the covariance model to the distances
        k = (covfct[0] + covfct[1]*(1-np.exp(-3*(P[:,3]/covfct[2])**covfct[3])))

        # cast as a matrix
        k = np.matrix(k).T

        # form a matrix of distances between existing data points
        K1 = squareform(pdist(P[:,:2]))

        # apply the covariance model to these distances
        K = (covfct[0] + covfct[1]*(1-np.exp(-3*(K1.ravel()/covfct[2])**covfct[3])))

        # re-cast as a NumPy array                  
        K = np.array(K)

        # reshape into an array
        K = K.reshape(len(P), len(P))               

        # if kriging is exact K diag = 0
        K[K1 == 0] = 0

        # cast as a matrix
        K = np.matrix(K)

        # add a column and row of ones to Ks,
        # with a zero in the bottom, right hand corner (Lagrangian)
        K2 = np.matrix(np.ones((len(P)+1, len(P)+1)))
        K2[:len(P), :len(P)] = K
        K2[-1, -1] = 0.0

        # add a one to the end of ks
        k3 = np.matrix(np.ones((len(P)+1, 1)))
        k3[:len(P)] = k

        # calculate the kriging weights
        weights = np.linalg.lstsq(K2, k3, rcond = 1)
        weights = weights[:-3][0][:-1]
        weights = np.array(weights)

        # calculate the residuals
        residuals = P[:, 2]

        # calculate the estimation
        prediction.append(np.dot(weights.T, residuals))
    return prediction

interpolate = ordinary(icovfct, idata, imesh)

Is there a way to optimize the code and hence reduce the computational time?

Since each iteration seem independent, you could split your mesh in n sections on run the processing on a n threads pool. — Donatien, May 14 '20 at 14:53
what is a mesh? an array? can you include a small mesh in this example so that we can run it? — Paul H, May 14 '20 at 14:55
do we need to download the files? can you just make a like a 15x15 array with a ~5 - 8 values to be interpolated? — Paul H, May 14 '20 at 15:17

score 0 · Answer 1 · answered May 14 '20 at 14:52

generally you can reduce the time by split one process into many .

in python we have python process lib that can do that, so if you have length of 20,000 , you can run the code twice at the same time and every process will handle length of 10,000 .....

i hope you this answer can fix your problem

MuellerSeb · Answer 2 · 2021-01-30T16:04:39.597

You could use GSTools as a drop-in-replacement for your code, where the kriging summation is implemented in Cython:

import numpy as np
import gstools as gs

icovfct = [36, 6524.62, 1383.13, 2]
imesh = np.array([[632230, 632090, 632110, 632130, 632150, 632170, 632190, 632210, 632230, 632250, 632270, 632290, 632310, 632070], 
                  [3045160, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045180, 3045200]], np.float64)
idata = np.array([[634026.049, 633901.182, 634001.365, 634007.132, 633893.706, 633802.327, 634144.246, 634015.993, 633897.326, 633779.479],
                  [3048117.579, 3048201.031, 3048191.922, 3047891.355, 3047994.462, 3048084.562, 3047633.421, 3047719.845, 3047818.914, 3047902.179],
                  [256.550, 236.317, 249.458, 281.889, 262.321, 239.495, 303.144, 295.319, 281.270, 261.083]], np.float64)

rescale = np.power(3, icovfct[3] ** -1)  # to provide the rescaling factor "3"
# what you are using is a stable covariance model
model = gs.Stable(
    dim=2,
    nugget=icovfct[0],
    var=icovfct[1],
    len_scale=icovfct[2],
    alpha=icovfct[3],
    rescale=rescale,
)
# use ordinary kriging
krige = gs.krige.Ordinary(model, cond_pos=idata[:2], cond_val=idata[2])
mesh = krige(imesh, mesh_type="structured")
ax = krige.plot()

If your given mesh wasn't meant to be structured, just edit the second last line: `mesh_type="unstructured"`. — MuellerSeb, Jan 18 '21 at 12:20

Alternative to for-loop for large dataset to improve computational speed

2 Answers2