Allocate items according to an approximate ratio in Python

Question

See update below...

I'm writing a Python simulation that assigns an arbitrary number of imaginary players one goal from an arbitrary pool of goals. The goals have two different levels or proportions of scarcity, prop_high and prop_low, at approximately a 3:1 ratio.

For example, if there are 16 players and 4 goals, or 8 players and 4 goals, the two pools of goals would look like this:

{'A': 6, 'B': 6, 'C': 2, 'D': 2}
{'A': 3, 'B': 3, 'C': 1, 'D': 1}

...with goals A and B occurring 3 times as often as C and D. 6+6+2+2 = 16, which corresponds to the number of players in the simulation, which is good.

I want to have a pool of goals equal to the number of players and distributed so that there are roughly three times as many prop_high goals as there are prop_low goals.

What's the best way to build an allocation algorithm according to a rough or approximate ratio—something that can handle rounding?

Update:

Assuming 8 players, here's how the distributions from 2 to 8 goals should hopefully look (prop_high players are starred):

    A   B   C   D   E   F   G   H
2   6*  2                       
3   6*  1   1                   
4   3*  3*  1   1               
5   3*  2*  1   1   1           
6   2*  2*  1*  1   1   1       
7   2*  1*  1*  1   1   1   1   
8   1*  1*  1*  1*  1   1   1   1

These numbers don't correspond to players. For example, with 5 goals and 8 players, goals A and B have a high proportion in the pool (3 and 2 respectively) while goals C, D, and E are more rare (1 each).

When there's an odd number of goals, the last of the prop_high gets one less than the others. As the number of goals approaches the number of players, each of the prop_high items gets one less until the end, when there is one of each goal in the pool.

What I've done below is assign quantities to the high and low ends of the pool and then make adjustments to the high end, subtracting values according to how close the number of goals is to the number of players. It works well with 8 players (the number of goals in the pool is always equal to 8), but that's all.

I'm absolutely sure there's a better, more Pythonic way to handle this sort of algorithm, and I'm pretty sure it's a relatively common design pattern. I just don't know where to start googling to find a more elegant way to handle this sort of structure (instead of the brute force method I'm using for now)

import string
import math
letters = string.uppercase

num_players = 8
num_goals = 5
ratio = (3, 1)

prop_high = ratio[0] / float(sum(ratio)) / (float(num_goals)/2)
prop_low = ratio[1] / float(sum(ratio)) / (float(num_goals)/2)

if num_goals % 2 == 1:
    is_odd = True
else:
    is_odd = False

goals_high = []
goals_low = []
high = []
low = []

# Allocate the goals to the pool. Final result will be incorrect.
count = 0
for i in range(num_goals):
    if count < num_goals/2:         # High proportion
        high.append(math.ceil(prop_high * num_players))
        goals_high.append(letters[i])
    else:                           # Low proportion
        low.append(math.ceil(prop_low * num_players))
        goals_low.append(letters[i])
    count += 1


# Make adjustments to the pool allocations to account for rounding and odd numbers
ratio_high_total = len(high)/float(num_players)
overall_ratio = ratio[1]/float(sum(ratio))
marker = (num_players / 2) + 1
offset = num_goals - marker

if num_players == num_goals:
    for i in high:
        high[int(i)] -= 1
elif num_goals == 1:
    low[0] = num_players
elif ratio_high_total == overall_ratio and is_odd:
    high[-1] -= 1
elif ratio_high_total >= overall_ratio:         # Upper half of possible goals
    print offset

    for i in range(offset):
        index = -(int(i) + 1)
        high[index] -= 1

goals = goals_high + goals_low
goals_quantities = high + low


print "Players:", num_players
print "Types of goals:", num_goals
print "Total goals in pool:", sum(goals_quantities)
print "High pool:", goals_high, high
print "Low pool:", goals_low, low
print goals, goals_quantities
print "High proportion:", prop_high, " || Low proportion:", prop_low

Can you explain what is important in this algorithm and perhaps rephrase the question? If I am understanding correctly, you have a certain number of goals, less than or equal to the number of players. From this pool of goals, you wish to draw a goal for each player such that the distribution of goals among players is such that there are roughly three times as many 'prop_high' goals as there are 'prop_low' goals? Is the ordering of goals among players important? — Cory Dolphin, Dec 31 '11 at 00:43
Sorry, I'll reword it. And yeah, I want to have a pool of goals equal to the number of players and distributed so that there are roughly three times as many `prop_high` goals as there are `prop_low` goals. The ordering isn't important; it's all going to be random eventually. — Andrew, Dec 31 '11 at 00:50
Must all "high" players be assigned the same number of goals? What about "low" players? Are you satisfied if the ratio is matched only in expectation, or do you require that each assignment is as close to the desired ratio as possible? — jrennie, Dec 31 '11 at 03:59
I just clarified all those questions above, I hope. I'm handling the actual assignment to players later... that's working fine. I'm trying to build the pool of possible goals here so that the total number in the pool is equal to the number of players. Players aren't "high" or "low"... goal prevalence is "high" or "low" — Andrew, Dec 31 '11 at 04:51

score 3 · Answer 1 · edited May 23 '17 at 12:08

Some time ago (okay, two and a half years) I asked a question that I think would be relevant here. Here's how I think you could use this: first, build a list of the priorities assigned to each goal. In your example, where the first half of the goal pool (rounded down) gets priority 3 and the rest get priority 1, one way to do this is

priorities = [3] * len(goals) / 2 + [1] * (len(goals) - len(goals) / 2)

Of course, you can create your list of priorities in any way you want; it doesn't have to be half 3s and half 1s. The only requirement is that all the entries be positive numbers.

Once you have the list, normalize it to have a sum equal to the number of players:

# Assuming num_players is already defined to be the number of players
normalized_priorities = [float(p) / sum(priorities) * num_players
                             for p in priorities]

Then apply one of the algorithms from my question to round these floating-point numbers to integers representing the actual allocations. Among the answers given, there are only two algorithms that do the rounding properly and satisfy the minimum variance criterion: adjusted fractional distribution (including the "Update" paragraph) and minimizing roundoff error. Conveniently, both of them appear to work for non-sorted lists. Here are my Python implementations:

import math, operator
from heapq import nlargest
from itertools import izip
item1 = operator.itemgetter(1)

def floor(f):
    return int(math.floor(f))
def frac(f):
    return math.modf(f)[0]

def adjusted_fractional_distribution(fn_list):
    in_list = [floor(f) for f in fn_list]
    loss_list = [frac(f) for f in fn_list]
    fsum = math.fsum(loss_list)
    add_list = [0] * len(in_list)
    largest = nlargest(int(round(fsum)), enumerate(loss_list),
                 key=lambda e: (e[1], e[0]))
    for i, loss in largest:
        add_list[i] = 1
    return [i + a for i,a in izip(in_list, add_list)]

def minimal_roundoff_error(fn_list):
    N = int(math.fsum(fn_list))
    temp_list = [[floor(f), frac(f), i] for i, f in enumerate(fn_list)]
    temp_list.sort(key = item1)
    lower_sum = sum(floor(f) for f in fn_list)
    difference = N - lower_sum
    for i in xrange(len(temp_list) - difference, len(temp_list)):
        temp_list[i][0] += 1
    temp_list.sort(key = item2)
    return [t[0] for t in temp_list]

In all my tests, both these methods are exactly equivalent, so you can pick either one to use.

Here's a usage example:

>>> goals = 'ABCDE'
>>> num_players = 17
>>> priorities = [3,3,1,1,1]
>>> normalized_priorities = [float(p) / sum(priorities) * num_players
                                 for p in priorities]
[5.666666..., 5.666666..., 1.888888..., 1.888888..., 1.888888...]
>>> minimal_roundoff_error(normalized_priorities)
[5, 6, 2, 2, 2]

If you want to allocate the extra players to the first goals within a group of equal priority, rather than the last, probably the easiest way to do this is to reverse the list before and after applying the rounding algorithm.

>>> def rlist(l):
...     return list(reversed(l))
>>> rlist(minimal_roundoff_error(rlist(normalized_priorities)))
[6, 5, 2, 2, 2]

Now, this may not quite match the distributions you expect, because in my question I specified a "minimum variance" criterion that I used to judge the result. That might not be appropriate for you case. You could try the "remainder distribution" algorithm instead of one of the two I mentioned above and see if it works better for you.

def remainder_distribution(fn_list):
    N = math.fsum(fn_list)
    rn_list = [int(round(f)) for f in fn_list]
    remainder = N - sum(rn_list)
    first = 0
    last = len(fn_list) - 1
    while remainder > 0 and last >= 0:
        if abs(rn_list[last] + 1 - fn_list[last]) < 1:
            rn_list[last] += 1
            remainder -= 1
        last -= 1
    while remainder < 0 and first < len(rn_list):
        if abs(rn_list[first] - 1 - fn_list[first]) < 1:
            rn_list[first] -= 1
            remainder += 1
        first += 1
    return rn_list

Holy crap, this is incredible! I just played around with these algorithms and it seems to be working! I'll mess around with them more in the morning to make sure everything is working. Thanks! — Andrew, Dec 31 '11 at 06:00

score 3 · Accepted Answer · answered Dec 31 '11 at 11:01

Rather than try to get the fractions right, I'd just allocate the goals one at a time in the appropriate ratio. Here the 'allocate_goals' generator assigns a goal to each of the low-ratio goals, then to each of the high-ratio goals (repeating 3 times). Then it repeats. The caller, in allocate cuts off this infinite generator at the required number (the number of players) using itertools.islice.

import collections
import itertools
import string

def allocate_goals(prop_low, prop_high):
    prop_high3 = prop_high * 3
    while True:
        for g in prop_low:
            yield g
        for g in prop_high3:
            yield g

def allocate(goals, players):
    letters = string.ascii_uppercase[:goals]
    high_count = goals // 2
    prop_high, prop_low = letters[:high_count], letters[high_count:]
    g = allocate_goals(prop_low, prop_high)
    return collections.Counter(itertools.islice(g, players))

for goals in xrange(2, 9):
    print goals, sorted(allocate(goals, 8).items())

It produces this answer:

2 [('A', 6), ('B', 2)]
3 [('A', 4), ('B', 2), ('C', 2)]
4 [('A', 3), ('B', 3), ('C', 1), ('D', 1)]
5 [('A', 3), ('B', 2), ('C', 1), ('D', 1), ('E', 1)]
6 [('A', 2), ('B', 2), ('C', 1), ('D', 1), ('E', 1), ('F', 1)]
7 [('A', 2), ('B', 1), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1)]
8 [('A', 1), ('B', 1), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1), ('H', 1)]

The great thing about this approach (apart from, I think, that it's easy to understand) is that it's quick to turn it into a randomized version.

Just replace allocate_goals with this:

def allocate_goals(prop_low, prop_high):
    all_goals = prop_low + prop_high * 3
    while True:
        yield random.choice(all_goals)

Allocate items according to an approximate ratio in Python

2 Answers2

Linked