-1

Python newbie here, I got this code from the internet(can't remember the source), and I am unable to understand how it works. What I want is to replace the output in a way so that it shows the name of the cities instead of the coordinates. Are they even linked ? Meaning once we input the values into the DB scan algorithm, do they lose their identity ? Is there any way to keep that so I can display the city names ? Any help or suggestion or edit to the question is appreciated

Here is a colab link.

kms_per_radian = 63.710088
epsilon = 1.500 / kms_per_radian
db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(coords))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))

clustersList = clusters.tolist()

def get_centermost_point(cluster):
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    return tuple(centermost_point)

lats, lons = zip(*centermost_points)
rep_points = pd.DataFrame({'lon':lons, 'lat':lats})
rs = rep_points.apply(lambda row: df[(df['lat']==row['lat']) & (df['lon']==row['lon'])].iloc[0], axis=1)



centermost_points = clusters.map(get_centermost_point)
Rohit Kumar
  • 684
  • 2
  • 17
  • 39
  • 1
    Do you mean reverse search the coordinates to find their labels ? Is there a better way for this ? Because I will have about 200 cities and their coords, and some coords may/may not be same for same city, and vice versa.. – Rohit Kumar Mar 24 '19 at 09:52

1 Answers1

1
clusters1 = pd.Series([names[cluster_labels == n] for n in range(num_clusters)])
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print(clusters1)
print(clusters)
print(df)

I went through your code, I found this clusters coordinates are grouped based on the labels. Instead of that see clusters1 there I have grouped the cluster names based on the coordinates. Hope I answer your question.

Justice_Lords
  • 949
  • 5
  • 14
  • Hi, the final print statement gives me the name of the cities but, they are being cut...is there a way I can get the cities grouped by clusters in list form? Also, can you please explain how did you do that ?? Thanks! – Rohit Kumar Mar 26 '19 at 05:00
  • @RohitKumar google colab replaces large values as "....". You can always print the individual output like `for i in clusters1 print(i)`. Actually in clusters1 the cities are grouped together based on labels. To convert to list you can check out this [SO POST](https://stackoverflow.com/questions/14822680/convert-python-dataframe-to-list). – Justice_Lords Mar 26 '19 at 10:02