Before talking about parallelization, you can work on optimizing your loop. The first way would be to iterate over the data instead of incremental values over the length and then access the data each time:
#toy sample
np.random.seed(1)
size_nbi = 20
size_Final = 100
nbi = pd.DataFrame({'LONG_017':np.random.random(size=size_nbi)/100+73,
'LAT_016':np.random.random(size=size_nbi)/100+73,})
Final = pd.DataFrame({'X':np.random.random(size=size_Final)/100+73,
'Y':np.random.random(size=size_Final)/100+73,
'CUM_OCC':np.random.randint(size_Final,size=size_Final)})
Using your method, you get for these size of dataframes about 75ms:
%%timeit
for i in range (len(nbi.LONG_017)):
StoredOCC = []
for j in range (len(Final.X)):
r = haversine(nbi.LONG_017[i], nbi.LAT_016[i], Final.X[j], Final.Y[j])
if (r < 0.03048):
SWw = Final.CUM_OCC[j]
StoredOCC.append(SWw)
if len(StoredOCC) != 0:
nbi.loc[i, 'ADTT_02_2019'] = np.max(StoredOCC)
# 75.6 ms ± 4.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Now if you change slightly the loops, iterate over the data themselves and use a list for the result then put the result as a column outside the loops, you can go down to 5ms so 15 times faster, by just optimizing it (and you might find even better optimization):
%%timeit
res_ADIT = []
for lon1, lat1 in zip (nbi.LONG_017.to_numpy(),
nbi.LAT_016.to_numpy()):
StoredOCC = []
for lon2, lat2, SWw in zip(Final.X.to_numpy(),
Final.Y.to_numpy(),
Final.CUM_OCC.to_numpy()):
r = haversine(lon1, lat1, lon2, lat2)
if (r < 0.03048):
StoredOCC.append(SWw)
if len(StoredOCC) != 0:
res_ADIT.append(np.max(StoredOCC))
else:
res_ADIT.append(np.nan)
nbi['ADIT_v2'] = res_ADIT
# 5.23 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Now, you can even go further and use numpy
and vectorization of your second loop, and also pass directly the radians values instead of having to map
them at each loop:
# empty list for result
res_ADIT = []
# work with array in radians
arr_lon2 = np.radians(Final.X.to_numpy())
arr_lat2 = np.radians(Final.Y.to_numpy())
arr_OCC = Final.CUM_OCC.to_numpy()
for lon1, lat1 in zip (np.radians(nbi.LONG_017), #pass directly the radians
np.radians(nbi.LAT_016)):
# do all the substraction in a vectorize way
arr_dlon = arr_lon2 - lon1
arr_dlat = arr_lat2 - lat1
# same here using numpy functions
arr_dist = np.sin(arr_dlat/2)**2 + np.cos(lat1) * np.cos(arr_lat2) * np.sin(arr_dlon/2)**2
arr_dist = 2 * np.arcsin(np.sqrt(arr_dist))
arr_dist *= 6372.8
# extract the values of CUM_OCC that meet the criteria
r = arr_OCC[arr_dist<0.03048]
# check that at least one element
if r.size>0:
res_ADIT.append(max(r))
else :
res_ADIT.append(np.nan)
nbi['AUDIT_np'] = res_ADIT
If you do a timeit
of this, you go down to 819 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
so about 90 time faster than you original solution for the small size of dataframes and all the results are the same
print(nbi)
LONG_017 LAT_016 ADTT_02_2019 ADIT_v2 AUDIT_np
0 73.004170 73.008007 NaN NaN NaN
1 73.007203 73.009683 30.0 30.0 30.0
2 73.000001 73.003134 14.0 14.0 14.0
3 73.003023 73.006923 82.0 82.0 82.0
4 73.001468 73.008764 NaN NaN NaN
5 73.000923 73.008946 NaN NaN NaN
6 73.001863 73.000850 NaN NaN NaN
7 73.003456 73.000391 NaN NaN NaN
8 73.003968 73.001698 21.0 21.0 21.0
9 73.005388 73.008781 NaN NaN NaN
10 73.004192 73.000983 93.0 93.0 93.0
11 73.006852 73.004211 NaN NaN NaN
You can play a bit with the code and increase the size of each toy dataframe, and you'll see how by increasing the size especially of the data Final, the gain becomes interesting for example with size_nbi = 20
and size_Final = 1000
, you get a gain of 400 time faster with the vectorized solution. and so on. For your full data size (3K *6M), you still need a bit of time and on my computer it would take about 25 minutes I estimate while your solution is in hundred of hours. Now if this is not enough, you can think about parallelization or also using numba