Looping over DataFrames like this is generally not recommended. Instead you should opt to try and vectorize your code as much as possible.
First we create an array for your inputs
x_vals = df2[['x1','x2','x3','x4','x5']].values
y_vals = df2[['y1','y2','y3','y4','y5']].values
z_vals = df2[['z1','z2','z3','z4','z5']].values
Next we need to create a bi2Dlinter function that handles vector inputs, this involves changing linspace/meshgrid to work for an array and changing the least_squares function. Normally scipy.linalg functions work over an array but as far as I'm aware the .lstsq method doesn't. Instead we can use the .SVD to replicate the same functionality over an array.
def create_ranges(start, stop, N, endpoint=True):
if endpoint==1:
divisor = N-1
else:
divisor = N
steps = (1.0/divisor) * (stop - start)
return steps[:,None]*np.arange(N) + start[:,None]
def linspace_nd(x,y,gridrez):
a1 = create_ranges(x.min(axis=1), x.max(axis=1), N=gridrez, endpoint=True)
a2 = create_ranges(y.min(axis=1), y.max(axis=1), N=gridrez, endpoint=True)
out_shp = a1.shape + (a2.shape[1],)
Xout = np.broadcast_to(a1[:,None,:], out_shp)
Yout = np.broadcast_to(a2[:,:,None], out_shp)
return Xout, Yout
def stacked_lstsq(L, b, rcond=1e-10):
"""
Solve L x = b, via SVD least squares cutting of small singular values
L is an array of shape (..., M, N) and b of shape (..., M).
Returns x of shape (..., N)
"""
u, s, v = np.linalg.svd(L, full_matrices=False)
s_max = s.max(axis=-1, keepdims=True)
s_min = rcond*s_max
inv_s = np.zeros_like(s)
inv_s[s >= s_min] = 1/s[s>=s_min]
x = np.einsum('...ji,...j->...i', v,
inv_s * np.einsum('...ji,...j->...i', u, b.conj()))
return np.conj(x, x)
def vectorized_bi2Dlinter(x_vals, y_vals, z_vals, gridrez):
X,Y = linspace_nd(x_vals, y_vals, gridrez)
A = np.stack((x_vals,y_vals,np.ones_like(z_vals)), axis=2)
C = stacked_lstsq(A, z_vals)
n_bcast = C.shape[0]
return C.T[0].reshape((n_bcast,1,1))*X + C.T[1].reshape((n_bcast,1,1))*Y + C.T[2].reshape((n_bcast,1,1))
Upon testing this on data for n=10000 rows, the vectorized function was significantly faster.
%%timeit
ZZ = []
for index, row in df2.iterrows():
x=row['x1'], row['x2'], row['x3'], row['x4'], row['x5']
y=row['y1'], row['y2'], row['y3'], row['y4'], row['y5']
z=row['z1'], row['z2'], row['z3'], row['z4'], row['z5']
ZZ.append((bi2Dlinter(x,y,z,gridrez)))
df2['ZZ']=ZZ
Out: 5.52 s ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
res = vectorized_bi2Dlinter(x_vals,y_vals,z_vals,gridrez)
Out: 74.6 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
You should pay careful attention to whats going on in this vectorize function and familiarize yourself with broadcasting in numpy. I cannot take credit for the first three functions, instead I will link their answers from stack overflow for you to get an understanding.
Vectorized NumPy linspace for multiple start and stop values
how to solve many overdetermined systems of linear equations using vectorized codes?
How to use numpy.c_ properly for arrays