With 0 person constituencies
import numpy as np
from numpy.random import multinomial, shuffle
from more_itertools import split_into
# Create test data
x = np.array(range(20))
# Shuffle test data
shuffle(x)
# Split into 5 constituencies
m = 5
constituencies = multinomial(len(x), [1/m] * m)
result = list(split_into(x, constituencies))
# [[3, 16, 9, 8, 15, 2, 6], [4], [12, 18, 5], [1, 11, 0, 17, 14, 19], [13, 7, 10]]
The multinomial
function here tells us how many times each of m
results occurs after len(x)
experiments have been performed, given that the result of an experiment is one of the m
results. Thus, the sum of all of the values is equal to the number of experiments -- len(x)
. Then, we use split_into
, which will split the data up into groups, where the size of group i
is equal to the value at constituencies[i]
.
Disallow 0 person constituencies
This next part of the code is the "easy" part, but I couldn't find a way to write it nicely. Obviously we don't want zero person constituencies, so if anyone has a better way of redistributing the values to ensure there are no zeroes, please suggest them! With that said, this code works:
zero_count = 0
for idx, value in enumerate(constituencies):
if value == 0:
constituencies[idx] += 1
zero_count += 1
elif zero_count > 0 and value > 1:
constituencies[idx] -= 1
zero_count -= 1
while zero_count > 0:
for idx, value in enumerate(constituencies):
if value > 1:
constituencies[idx] -= 1
zero_count -= 1
if zero_count == 0:
break
You'll want to insert this block right below the constituencies
definition to ensure no constituencies have a size of zero.
Note: This only works if your x
has at least m
elements: it will get stuck in an infinite loop otherwise.
Edit
After further thought, I realized that we could write the code for constituencies with at least min_size
people in the following way:
import numpy as np
from numpy.random import multinomial, shuffle
from more_itertools import split_into
# Create test data
x = np.array(range(20))
# Shuffle test data
shuffle(x)
# Split into 5 constituencies
m = 5
min_size = 2
# List of length m, where the ith entry is the number
# of people in the ith constituency
constituencies = multinomial(len(x) - m * min_size, [1/m] * m)
constituencies = [const_size + min_size for const_size in constituencies]
result = list(split_into(x, constituencies))
print(result)
In this case, if the size of your population -- len(x)
-- is too small for your minimum population size, then the program will raise a ValueError
going into multinomial
. This method works by holding out min_size * m
people, and then adding min_size
to each constituency, ensuring randomness (given that x
is sufficiently large) and a minimum constituency size.
Note on Probability Distribution
As I've noticed the comments discuss the chosen probability distribution for constituency size, note that the final argument of multinomial
: [1/m] * m
, is the probability distribution. Here, I use a uniform probability distribution, but you can provide whichever works best for you (so long as it has m
elements).