k-Means Size constrain clustering

Question

I have a data set that looks like this:

	segment latitude	longitude	population	territory
0	9011110 54.324926	10.134807	12576	91510
1	9011120 54.340829	10.132004	33311	91510
2	9011131 54.262529	10.048029	57118	91510
3	9011132 54.274682	10.143689	32710	91510
4	9011141 54.307205	10.194190	50674	91510
5	9011142 54.362525	10.326090	43275	91510
6	9011151 54.235966	10.277740	32862	91510
7	9011152 54.264166	10.497505	35592	91510
11	9011222 54.415395	10.037201	56858	91510
12	9011231 54.308678	9.972383	55340	91510
13	9011232 54.359855	10.094303	28429	91510
14	9012110 53.869215	10.689676	14537	91510
15	9012120 53.872428	10.584411	72105	91510
16	9012130 53.821717	10.655533	68443	91510
17	9012140 53.887212	10.747456	38843	91510
18	9012150 53.803897	10.399437	53467	91510
19	9012161 53.620451	10.663859	43066	91510
20	9012162 53.717856	10.725226	31248	91510
21	9012210 53.964882	10.597354	45998	91510
22	9012221 53.919901	10.812425	32346	91510
23	9012222 53.991968	10.732334	35211	91510
24	9012231 54.139890	10.624895	41413	91510
25	9012232 54.163248	10.906545	31381	91510
26	9012241 54.283058	10.902053	24329	91510
27	9012242 54.468495	11.150720	23940	91510
42	9014011 54.085244	9.961614	38804	91510
43	9014012 54.132617	10.109267	45662	91510
48	9014031 53.951740	10.320808	56094	91510
49	9014032 53.908341	9.928129	59323	91510
50	9014033 54.023233	10.122569	44901	91510
58	9015152 53.793006	10.101515	55750	91510
59	9015211 53.668416	10.241130	55399	91510
60	9015212 53.665917	10.348704	50951	91510
61	9015221 53.535001	10.325943	48575	91510
62	9015222 53.595385	10.226776	44630	91510
63	9015230 53.457673	10.371507	45485	91510
64	9015240 53.486733	10.548856	51441	91510

I want to Perform K-means clustering on latitude and longitude so that I should have a max population size of 300,000 in each clusters(Territory). Initially, I thought of running a cumulative iteration to get 300,000. but I saw this Algorithm for clustering with minimum size constraints and it seems to be a way out.

but i tried

clus=KMeansConstrained(n_clusters=8, size_min=300000, size_max=300000, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=False, random_state=None, copy_x=True, n_jobs=1)

got this error size_min and size_max must be a positive number smaller than the number of data points or None``

how can I run k-means size constrain of 300,000 with my data?

If I understand correctly, you want to split your dataset into exactly 8 clusters with exactly 300000 elements per cluster. This can only work if you have exactly 8*300000 = 2400000 elements in your dataset. The dataset you've shown in your question has 64 elements, not 2400000 elements. What are you trying to do with this dataset? — Stef, Oct 06 '21 at 09:05
not 300000 element. the clusters should only have 300000 people(Population) in each cluster. — kukulu, Oct 06 '21 at 09:10
Sorry, I don't make a distinction between "people" and "element". The fact is, you want each cluster to have 300000 people, and there are 8 clusters, so there should be a total of 2400000 people in your dataset. But you have only 64 people, judging by what you've shown us. — Stef, Oct 06 '21 at 09:15
i want to have a cluster using `latitude` and `longitude` in this clusters, there should be 300000 population. I think the 300000 can be achieved with cumulative algorithm but, how to put all together using the `k-means constrain ` is my problem — kukulu, Oct 06 '21 at 09:23
Ooooooooh your datapoints are *weighted* by the value in column "population". I didn't understand that. Well, I don't think `KMeansConstrained` handles weights. The way you called it, it considers "population" to be a field just like "latitude" and "longitude", and uses it to determine if two datapoints are similar. — Stef, Oct 06 '21 at 09:27
`population` rows should be sum up to 300000. any rows that sum of to 300000, should be assign to a cluster. like and `iteration` — kukulu, Oct 06 '21 at 09:30

k-Means Size constrain clustering

0 Answers0