I have a series of values that I want to split up into 5 bins, so that the sum of the values in the bins is approximately the same.
AD8_normal
075e5e2b-dd96-4097-8fc3-c3b82f610277 18
bd9a3e1f-b7a1-4ad9-ac8f-c6009dd5073a 18
b50fa764-1a1d-4760-bd6f-6efbba316adb 16
8cc1408b-9509-485a-8603-41895a53925d 14
3cb98c8c-918f-45fd-bd9f-11b0a6d19838 13
5fa9da0d-bfcb-48da-8bb3-e245e995e3e6 13
5950b2d8-b51a-4c84-86f6-40ee59ddf8c7 12
bbd07332-92a1-4f05-b7de-a0164b017343 12
9b67c710-f5ee-4a4b-ba80-eaf905a1a4fc 12
140f98ce-2ffe-4aca-accb-76f1cf2f08a9 12
177b0245-8ff4-4a34-b15b-1ddf909e0c9e 11
b5aaf5c4-fd3d-4781-9eaf-92e7dee4a234 11
e24f86ce-5cfe-487a-8f23-929fb7674a4a 11
b7ddaf74-adde-4bdf-9e07-f0d1a06176e7 11
5b442d58-78d8-4f3e-9d86-6782dd9945ca 10
acdcd702-dcae-45bc-b56f-1f50e4610c2a 10
c5ac2af7-e040-4d0f-b281-704349338463 10
10f815fa-16d4-4efb-a3e5-05b2f197688e 10
7098c542-7475-4030-87dd-6405952d179f 10
67566970-81f7-4f69-8765-2e1fbe242e62 9
7f2d7f97-da2e-408d-8f62-d46b27cc19fb 8
e0c43c9f-764f-44e9-86af-c80417253ba3 8
efcd4689-a4f1-4bcc-810a-57a7c3141f14 8
acbf916d-fafc-4f4d-8a2e-eeb3c35d4040 8
2fa4333d-dc1c-4f68-a745-c1cbdf90ed95 7
9b96a2c2-f3c5-479a-9666-1d4c44a17270 7
7e1070a8-8f96-440e-998f-409383a9faf1 7
90b1971d-c70f-495a-bdb0-cc3c5b67bbe1 7
Name: id, dtype: int64
I can randomly sample indices, in which case I would expect most bins to lead to similar total counts. However, there's obviously the chance that I accidentally sample the top N indices in a single bin, so that bin becomes 'overweight'. Is there a way to cut up this series into 5 parts (i.e., 5 sets of indices) so each part has a value sum that is approximately the same?
To reproduce the array:
idx = ['075e5e2b-dd96-4097-8fc3-c3b82f610277', 'bd9a3e1f-b7a1-4ad9-ac8f-c6009dd5073a', 'b50fa764-1a1d-4760-bd6f-6efbba316adb', '8cc1408b-9509-485a-8603-41895a53925d', '3cb98c8c-918f-45fd-bd9f-11b0a6d19838', '5fa9da0d-bfcb-48da-8bb3-e245e995e3e6', '5950b2d8-b51a-4c84-86f6-40ee59ddf8c7', 'bbd07332-92a1-4f05-b7de-a0164b017343', '9b67c710-f5ee-4a4b-ba80-eaf905a1a4fc', '140f98ce-2ffe-4aca-accb-76f1cf2f08a9', '177b0245-8ff4-4a34-b15b-1ddf909e0c9e', 'b5aaf5c4-fd3d-4781-9eaf-92e7dee4a234', 'e24f86ce-5cfe-487a-8f23-929fb7674a4a', 'b7ddaf74-adde-4bdf-9e07-f0d1a06176e7', '5b442d58-78d8-4f3e-9d86-6782dd9945ca', 'acdcd702-dcae-45bc-b56f-1f50e4610c2a', 'c5ac2af7-e040-4d0f-b281-704349338463', '10f815fa-16d4-4efb-a3e5-05b2f197688e', '7098c542-7475-4030-87dd-6405952d179f', '67566970-81f7-4f69-8765-2e1fbe242e62', '7f2d7f97-da2e-408d-8f62-d46b27cc19fb', 'e0c43c9f-764f-44e9-86af-c80417253ba3', 'efcd4689-a4f1-4bcc-810a-57a7c3141f14', 'acbf916d-fafc-4f4d-8a2e-eeb3c35d4040', '2fa4333d-dc1c-4f68-a745-c1cbdf90ed95', '9b96a2c2-f3c5-479a-9666-1d4c44a17270', '7e1070a8-8f96-440e-998f-409383a9faf1', '90b1971d-c70f-495a-bdb0-cc3c5b67bbe1']
vals = [18, 18, 16, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 9, 8, 8, 8, 8, 7, 7, 7, 7]
AD8_normal = pandas.Series(data=vals, index=idx)