I have a data set that looks like this:
segment latitude | longitude | population | territory | |
---|---|---|---|---|
0 | 9011110 54.324926 | 10.134807 | 12576 | 91510 |
1 | 9011120 54.340829 | 10.132004 | 33311 | 91510 |
2 | 9011131 54.262529 | 10.048029 | 57118 | 91510 |
3 | 9011132 54.274682 | 10.143689 | 32710 | 91510 |
4 | 9011141 54.307205 | 10.194190 | 50674 | 91510 |
5 | 9011142 54.362525 | 10.326090 | 43275 | 91510 |
6 | 9011151 54.235966 | 10.277740 | 32862 | 91510 |
7 | 9011152 54.264166 | 10.497505 | 35592 | 91510 |
11 | 9011222 54.415395 | 10.037201 | 56858 | 91510 |
12 | 9011231 54.308678 | 9.972383 | 55340 | 91510 |
13 | 9011232 54.359855 | 10.094303 | 28429 | 91510 |
14 | 9012110 53.869215 | 10.689676 | 14537 | 91510 |
15 | 9012120 53.872428 | 10.584411 | 72105 | 91510 |
16 | 9012130 53.821717 | 10.655533 | 68443 | 91510 |
17 | 9012140 53.887212 | 10.747456 | 38843 | 91510 |
18 | 9012150 53.803897 | 10.399437 | 53467 | 91510 |
19 | 9012161 53.620451 | 10.663859 | 43066 | 91510 |
20 | 9012162 53.717856 | 10.725226 | 31248 | 91510 |
21 | 9012210 53.964882 | 10.597354 | 45998 | 91510 |
22 | 9012221 53.919901 | 10.812425 | 32346 | 91510 |
23 | 9012222 53.991968 | 10.732334 | 35211 | 91510 |
24 | 9012231 54.139890 | 10.624895 | 41413 | 91510 |
25 | 9012232 54.163248 | 10.906545 | 31381 | 91510 |
26 | 9012241 54.283058 | 10.902053 | 24329 | 91510 |
27 | 9012242 54.468495 | 11.150720 | 23940 | 91510 |
42 | 9014011 54.085244 | 9.961614 | 38804 | 91510 |
43 | 9014012 54.132617 | 10.109267 | 45662 | 91510 |
48 | 9014031 53.951740 | 10.320808 | 56094 | 91510 |
49 | 9014032 53.908341 | 9.928129 | 59323 | 91510 |
50 | 9014033 54.023233 | 10.122569 | 44901 | 91510 |
58 | 9015152 53.793006 | 10.101515 | 55750 | 91510 |
59 | 9015211 53.668416 | 10.241130 | 55399 | 91510 |
60 | 9015212 53.665917 | 10.348704 | 50951 | 91510 |
61 | 9015221 53.535001 | 10.325943 | 48575 | 91510 |
62 | 9015222 53.595385 | 10.226776 | 44630 | 91510 |
63 | 9015230 53.457673 | 10.371507 | 45485 | 91510 |
64 | 9015240 53.486733 | 10.548856 | 51441 | 91510 |
I want to Perform K-means clustering on latitude
and longitude
so that I should have a max population
size of 300,000
in each clusters(Territory).
Initially, I thought of running a cumulative iteration to get 300,000
. but I saw this Algorithm for clustering with minimum size constraints and it seems to be a way out.
but i tried
clus=KMeansConstrained(n_clusters=8, size_min=300000, size_max=300000, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=False, random_state=None, copy_x=True, n_jobs=1)
got this error size_min and size_max must be a positive number smaller than the number of data points or
None``
how can I run k-means size constrain of 300,000
with my data?