-1

I am trying to cluster and divide my lat long data into 12 different areas, however the kmeans algorithm is messing up big time. I tried just 2 clusters and it broke so badly (picture attached) it didnt even work well for 12. I know the kmeans is senstive to noise and i cleaned that out as well

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from collections import Counter
df = pd.read_csv("all.csv");
df = df.dropna()
df = df.loc[ ~(df["area"]=="FarEast")]

df["Latitude"] = df["Latitude"].astype(float)
df["Longitude"] = df["Longitude"].astype(float)
df = df.drop(df.nsmallest(4,"Longitude").index)
X=df.loc[:,['Latitude','Longitude']]

X = X.reset_index()
id_n=2
kmeans = KMeans(n_clusters=id_n, random_state=0).fit(X)
id_label=kmeans.labels_
#plot result
ptsymb = np.array(['b.','r.','m.','g.','c.','k.','b*','r*','m*','r^']);
plt.figure(figsize=(12,12))
plt.ylabel('Longitude', fontsize=12)
plt.xlabel('Latitude', fontsize=12)

# import itertools
# marker = itertools.cycle((',', '+', '.', 'o', '*')) 



for i in range(id_n):
    cluster=np.where(id_label==i)[0]
    plt.plot(X.Latitude[cluster].values,X.Longitude[cluster].values,ptsymb[i])
plt.show()

Cluster Images

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Taher Hozefa
  • 21
  • 1
  • 4
  • Try search. You question might have been answered already https://stackoverflow.com/questions/53075481/how-do-i-cluster-a-list-of-geographic-points-by-distance (with many links to similar questions) – Sergey Bushmanov Mar 13 '19 at 06:50

1 Answers1

0

Clearly there is something wrong with your indexing.

The result you plotted is impossible by k-means on these two attributes. It's not noise robustness that causes such effects - even then, k-mrans clusters would necessarily be Voronoi cells.

Either you used different attributes, or different row indexes. So the error is somewhere in your invocation, not in k-means.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194