1

I'm simulating US governmental structure and I have a list of n voters (random length between 10,000 and 100,000).

I'd like to create constituencies, which means subdividing that into states, and thence into districts.

All of these are randomly sized.

How can I take a list of n elements and create m lists of random size, which together constitute the original n?

mas
  • 1,155
  • 1
  • 11
  • 28
  • Does `m` have to be a fixed value? – Random Davis Jun 09 '21 at 21:53
  • I hadn't thought of that, I'll probably randomize it, but yes, the algorithm should take a fixed number because constituencies are determined. – mas Jun 09 '21 at 21:56
  • [This](https://stackoverflow.com/questions/57915609/split-a-list-into-n-randomly-sized-chunks) looks promising, does that help? – Random Davis Jun 09 '21 at 21:57
  • I think that will do it, I'm going to try now. Thank you! – mas Jun 09 '21 at 21:59
  • @RandomDavis it seems that m is not determined in that case. That's kind of problematic. I can't specify the number of polities. – mas Jun 09 '21 at 22:14
  • But I think the linked duplicates might do. – mas Jun 09 '21 at 22:16
  • Nope. Damn. Determined size bins. – mas Jun 09 '21 at 22:16
  • 1
    Randomly sized could mean a lot of things. According to what distribution should the sizes be sampled? Do you expect all of them to be about the same size? Should they be following something like zipf;s law https://en.wikipedia.org/wiki/Zipf%27s_law? – sev Jun 09 '21 at 22:24
  • Totally agree with @sev here - there are different approaches you could take which are all in some sense random but the probability distributions will not be the same. – alani Jun 09 '21 at 22:58

1 Answers1

3

With 0 person constituencies

import numpy as np
from numpy.random import multinomial, shuffle
from more_itertools import split_into

# Create test data 
x = np.array(range(20))

# Shuffle test data
shuffle(x)

# Split into 5 constituencies
m = 5
constituencies = multinomial(len(x), [1/m] * m)
result = list(split_into(x, constituencies))
# [[3, 16, 9, 8, 15, 2, 6], [4], [12, 18, 5], [1, 11, 0, 17, 14, 19], [13, 7, 10]]

The multinomial function here tells us how many times each of m results occurs after len(x) experiments have been performed, given that the result of an experiment is one of the m results. Thus, the sum of all of the values is equal to the number of experiments -- len(x). Then, we use split_into, which will split the data up into groups, where the size of group i is equal to the value at constituencies[i].

Disallow 0 person constituencies

This next part of the code is the "easy" part, but I couldn't find a way to write it nicely. Obviously we don't want zero person constituencies, so if anyone has a better way of redistributing the values to ensure there are no zeroes, please suggest them! With that said, this code works:

zero_count = 0
for idx, value in enumerate(constituencies):
    if value == 0:
        constituencies[idx] += 1
        zero_count += 1
    elif zero_count > 0 and value > 1:
        constituencies[idx] -= 1
        zero_count -= 1
while zero_count > 0:
    for idx, value in enumerate(constituencies):
        if value > 1:
            constituencies[idx] -= 1
            zero_count -= 1
            if zero_count == 0:
                break

You'll want to insert this block right below the constituencies definition to ensure no constituencies have a size of zero.

Note: This only works if your x has at least m elements: it will get stuck in an infinite loop otherwise.

Edit

After further thought, I realized that we could write the code for constituencies with at least min_size people in the following way:

import numpy as np
from numpy.random import multinomial, shuffle
from more_itertools import split_into

# Create test data 
x = np.array(range(20))

# Shuffle test data
shuffle(x)

# Split into 5 constituencies
m = 5
min_size = 2

# List of length m, where the ith entry is the number
# of people in the ith constituency
constituencies = multinomial(len(x) - m * min_size, [1/m] * m)
constituencies = [const_size + min_size for const_size in constituencies]
result = list(split_into(x, constituencies))
print(result)

In this case, if the size of your population -- len(x) -- is too small for your minimum population size, then the program will raise a ValueError going into multinomial. This method works by holding out min_size * m people, and then adding min_size to each constituency, ensuring randomness (given that x is sufficiently large) and a minimum constituency size.

Note on Probability Distribution

As I've noticed the comments discuss the chosen probability distribution for constituency size, note that the final argument of multinomial: [1/m] * m, is the probability distribution. Here, I use a uniform probability distribution, but you can provide whichever works best for you (so long as it has m elements).

Kraigolas
  • 5,121
  • 3
  • 12
  • 37
  • Your final version is fantastic, but I'm not quite sure how to get a nice pareto distribution (it seems to me that's the most accurate description). I tried generating a pareto distributed list with `numpy.random.pareto` but there were... problems. – mas Jun 11 '21 at 12:52
  • To generate `distribution` which I fed to the final argument of `multinomial` I generated a pool of pareto distributed numbers with `pareto(1, 10000)` and then generated a list comprehension thusly: `[clamp(r.choice(PARETOPOOL)) for _ in range(m)]` where clamp is just the min/max method between 0 and 1 (because it seems multinomial demands such numbers. – mas Jun 11 '21 at 12:55
  • 1
    @malan88 I'm not well versed on the `pareto` function. I suggest you pose a new question on StackOverflow to see if some other users may be able to help. – Kraigolas Jun 11 '21 at 19:16