Good morning. I have a DB with almost 1.3 million rows (Lunar Craters DB) and I want to cluster the craters that are inside bigger craters. To do that I ordered the DB from bigger to smaller and than iterate the bigger over the others to calculate with the distance between the positions are inside the diameter. The problem is that this calculation take about 50 seconds per crater, so will take some months to calculate all the DB. I tried some alternatives techniques like Dask, Multiprocessing, but didn't work. With anyone could help me.
cluster = 1
for i in range(len(craters_diam)):
start2 = datetime.now()
if craters_diam.loc[i, 'CLUSTER'] == 0:
craters_diam.loc[i, 'CLUSTER'] = cluster
lat1 = craters_diam.loc[i, 'LAT_CIRC_IMG']
lon1 = craters_diam.loc[i, 'LON_CIRC_IMG']
diam = craters_diam.loc[i, 'DIAM_CIRC_IMG']
for j in range(i+1, len(craters_diam)):
if craters_diam.loc[j, 'CLUSTER'] == 0:
lat2 = craters_diam.loc[j, 'LAT_CIRC_IMG']
lon2 = craters_diam.loc[j, 'LON_CIRC_IMG']
dist = distance(lat1, lat2, lon1, lon2)
if dist <= diam/2:
craters_diam.loc[j, 'CLUSTER'] = cluster
cluster += 1
print(datetime.now() - start2)
print(datetime.now() - start)
The distance function calculate in spheric geometry.
If anyone knows a clever (faster) way to that, thank you!!!