1

I have a data set that looks like this:

segment latitude longitude population territory
0 9011110 54.324926 10.134807 12576 91510
1 9011120 54.340829 10.132004 33311 91510
2 9011131 54.262529 10.048029 57118 91510
3 9011132 54.274682 10.143689 32710 91510
4 9011141 54.307205 10.194190 50674 91510
5 9011142 54.362525 10.326090 43275 91510
6 9011151 54.235966 10.277740 32862 91510
7 9011152 54.264166 10.497505 35592 91510
11 9011222 54.415395 10.037201 56858 91510
12 9011231 54.308678 9.972383 55340 91510
13 9011232 54.359855 10.094303 28429 91510
14 9012110 53.869215 10.689676 14537 91510
15 9012120 53.872428 10.584411 72105 91510
16 9012130 53.821717 10.655533 68443 91510
17 9012140 53.887212 10.747456 38843 91510
18 9012150 53.803897 10.399437 53467 91510
19 9012161 53.620451 10.663859 43066 91510
20 9012162 53.717856 10.725226 31248 91510
21 9012210 53.964882 10.597354 45998 91510
22 9012221 53.919901 10.812425 32346 91510
23 9012222 53.991968 10.732334 35211 91510
24 9012231 54.139890 10.624895 41413 91510
25 9012232 54.163248 10.906545 31381 91510
26 9012241 54.283058 10.902053 24329 91510
27 9012242 54.468495 11.150720 23940 91510
42 9014011 54.085244 9.961614 38804 91510
43 9014012 54.132617 10.109267 45662 91510
48 9014031 53.951740 10.320808 56094 91510
49 9014032 53.908341 9.928129 59323 91510
50 9014033 54.023233 10.122569 44901 91510
58 9015152 53.793006 10.101515 55750 91510
59 9015211 53.668416 10.241130 55399 91510
60 9015212 53.665917 10.348704 50951 91510
61 9015221 53.535001 10.325943 48575 91510
62 9015222 53.595385 10.226776 44630 91510
63 9015230 53.457673 10.371507 45485 91510
64 9015240 53.486733 10.548856 51441 91510

I want to Perform K-means clustering on latitude and longitude so that I should have a max population size of 300,000 in each clusters(Territory). Initially, I thought of running a cumulative iteration to get 300,000. but I saw this Algorithm for clustering with minimum size constraints and it seems to be a way out.

but i tried

clus=KMeansConstrained(n_clusters=8, size_min=300000, size_max=300000, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=False, random_state=None, copy_x=True, n_jobs=1)

got this error size_min and size_max must be a positive number smaller than the number of data points or None``

how can I run k-means size constrain of 300,000 with my data?

Mahrkeenerh
  • 1,104
  • 1
  • 9
  • 25
kukulu
  • 27
  • 7
  • If I understand correctly, you want to split your dataset into exactly 8 clusters with exactly 300000 elements per cluster. This can only work if you have exactly 8*300000 = 2400000 elements in your dataset. The dataset you've shown in your question has 64 elements, not 2400000 elements. What are you trying to do with this dataset? – Stef Oct 06 '21 at 09:05
  • not 300000 element. the clusters should only have 300000 people(Population) in each cluster. – kukulu Oct 06 '21 at 09:10
  • Sorry, I don't make a distinction between "people" and "element". The fact is, you want each cluster to have 300000 people, and there are 8 clusters, so there should be a total of 2400000 people in your dataset. But you have only 64 people, judging by what you've shown us. – Stef Oct 06 '21 at 09:15
  • i want to have a cluster using `latitude` and `longitude` in this clusters, there should be 300000 population. I think the 300000 can be achieved with cumulative algorithm but, how to put all together using the `k-means constrain ` is my problem – kukulu Oct 06 '21 at 09:23
  • Ooooooooh your datapoints are *weighted* by the value in column "population". I didn't understand that. Well, I don't think `KMeansConstrained` handles weights. The way you called it, it considers "population" to be a field just like "latitude" and "longitude", and uses it to determine if two datapoints are similar. – Stef Oct 06 '21 at 09:27
  • `population` rows should be sum up to 300000. any rows that sum of to 300000, should be assign to a cluster. like and `iteration` – kukulu Oct 06 '21 at 09:30

0 Answers0