Algorithm/Query For Determine Weighted Reward

Question

I'm working on a game where a user is given a random reward with each reward's chance of being delivered weighted by a "share" system. To illustrate the issue, let's look at an example:

Let's say I was going to give a pure random reward, out of set: [A, B, C]. It would be as simple as getting a random index and returning whatever reward from the array.

In my use case, however, the reward set is weight with a "share" system. So we have a situation like below:

A - Weight of 2.
B - Weight of 3.
C - Weight of 5.

So the set looks like something like this: [A, A, B, B, B, C, C, C, C, C].

Now, I could just build out that array and get a random result like that, but I'm concerned about performance implications. These rewards are stored in a database (using Django and PSQL as the backend if that factors in), and there may be upwards of 100 potential rewards, each with a weight of 1-100 (or more).

So I'm trying to figure out an efficient method to pull a random reward based on such weight.

As I'm writing this, one thing the comes to mind (but I'd like to hear other thoughts):

After a reward set is updated, build out the array once, and then assign each reward a percentage chance. So calculate percentage once.
In the above, we would end up with something like: [A: 0.2, B: 0.3, C: 0.5].
Then we could get a random number between 0 and 1, and pull the reward that's the closest without going over? (so a random roll of 0.45 would get B since 0.3 is closest to 0.45 without being higher than 0.45).
- I'm struggling to think how one would write a performant query for this however. Maybe query all rewards with percentages below the roll (0.45 in example above) and return the one with the highest percent from those results.

I think I might have just answered my own question, but would appreciate a third-party perspective. Thanks all!

I think there's some logic issues in my proposed answer. If we always select something lower than the roll, we incorrectly favor lower results. So in the example, B has 30% chance of being selects (on paper), but a roll of 0.45 probably should have selected C (which we have a 50% chance of selecting). — Bryant Makes Programs, Oct 19 '19 at 17:22
Possible duplicate of [Select random row from a PostgreSQL table with weighted row probabilities](https://stackoverflow.com/questions/13040246/select-random-row-from-a-postgresql-table-with-weighted-row-probabilities) — Peter O., Oct 19 '19 at 22:49
Eh, kind of sort of. I was too focused on the Django-specific nature of my issue that I overlooked the linked question (which I had seen). Doesn't help that the answer I marked as best is likewise a straight PSQL query (rather than a Django-specific code). — Bryant Makes Programs, Oct 19 '19 at 23:13

score 2 · Accepted Answer · answered Oct 19 '19 at 17:57

You should use cumulative percentage. Example setup:

create table weights(label text, weight int, cumulative_percentage float);
insert into weights (label, weight) values
('A', 2),
('B', 3),
('C', 5);

update weights w
set cumulative_percentage = cumulative_sum::float/ total
from (
    select 
        label,
        sum(weight) over (order by label) as cumulative_sum,
        sum(weight) over () as total
    from weights
    ) s
where s.label = w.label;

So the table looks like this:

select *
from weights

 label | weight | cumulative_percentage 
-------+--------+-----------------------
 A     |      2 |                   0.2
 B     |      3 |                   0.5
 C     |      5 |                     1
(3 rows)

Use a common table expression to get a single random number in a loop:

with seed(r) as (select random())
select min(label)
from weights, seed
where r <= cumulative_percentage

or create a function:

create or replace function get_percentageed_reward(r float)
returns text language sql as $$
    select min(label)
    from weights
    where r <= cumulative_percentage
$$;

Check the results of the alghorithm:

select get_percentageed_reward(random()), count(*)
from generate_series(1, 100000)
group by 1
order by 1

 get_percentageed_reward | count 
-------------------------+-------
 A                       | 20008
 B                       | 30165
 C                       | 49827
(3 rows)

This was it exactly. The "cumulative" portion was the missing ingredient. So instead of a percentage set of [A: 0.2, B: 0.3, C: 0.5]., it would be [A: 0.2, B: 0.5, C: 1.0]. And we would select the reward closest but higher than the roll. Magic, my good sir, and a well documented answer! — Bryant Makes Programs, Oct 19 '19 at 18:06
And implemented in my project with fantastic results. Thanks a bunch! — Bryant Makes Programs, Oct 19 '19 at 18:19

Algorithm/Query For Determine Weighted Reward

1 Answers1