2

I'm working on a game where a user is given a random reward with each reward's chance of being delivered weighted by a "share" system. To illustrate the issue, let's look at an example:

Let's say I was going to give a pure random reward, out of set: [A, B, C]. It would be as simple as getting a random index and returning whatever reward from the array.

In my use case, however, the reward set is weight with a "share" system. So we have a situation like below:

  • A - Weight of 2.
  • B - Weight of 3.
  • C - Weight of 5.

So the set looks like something like this: [A, A, B, B, B, C, C, C, C, C].

Now, I could just build out that array and get a random result like that, but I'm concerned about performance implications. These rewards are stored in a database (using Django and PSQL as the backend if that factors in), and there may be upwards of 100 potential rewards, each with a weight of 1-100 (or more).

So I'm trying to figure out an efficient method to pull a random reward based on such weight.


As I'm writing this, one thing the comes to mind (but I'd like to hear other thoughts):

  • After a reward set is updated, build out the array once, and then assign each reward a percentage chance. So calculate percentage once.
  • In the above, we would end up with something like: [A: 0.2, B: 0.3, C: 0.5].
  • Then we could get a random number between 0 and 1, and pull the reward that's the closest without going over? (so a random roll of 0.45 would get B since 0.3 is closest to 0.45 without being higher than 0.45).
    • I'm struggling to think how one would write a performant query for this however. Maybe query all rewards with percentages below the roll (0.45 in example above) and return the one with the highest percent from those results.

I think I might have just answered my own question, but would appreciate a third-party perspective. Thanks all!

Bryant Makes Programs
  • 1,493
  • 2
  • 17
  • 39
  • I think there's some logic issues in my proposed answer. If we always select something lower than the roll, we incorrectly favor lower results. So in the example, B has 30% chance of being selects (on paper), but a roll of 0.45 probably should have selected C (which we have a 50% chance of selecting). – Bryant Makes Programs Oct 19 '19 at 17:22
  • Possible duplicate of [Select random row from a PostgreSQL table with weighted row probabilities](https://stackoverflow.com/questions/13040246/select-random-row-from-a-postgresql-table-with-weighted-row-probabilities) – Peter O. Oct 19 '19 at 22:49
  • Eh, kind of sort of. I was too focused on the Django-specific nature of my issue that I overlooked the linked question (which I had seen). Doesn't help that the answer I marked as best is likewise a straight PSQL query (rather than a Django-specific code). – Bryant Makes Programs Oct 19 '19 at 23:13

1 Answers1

2

You should use cumulative percentage. Example setup:

create table weights(label text, weight int, cumulative_percentage float);
insert into weights (label, weight) values
('A', 2),
('B', 3),
('C', 5);

update weights w
set cumulative_percentage = cumulative_sum::float/ total
from (
    select 
        label,
        sum(weight) over (order by label) as cumulative_sum,
        sum(weight) over () as total
    from weights
    ) s
where s.label = w.label;

So the table looks like this:

select *
from weights

 label | weight | cumulative_percentage 
-------+--------+-----------------------
 A     |      2 |                   0.2
 B     |      3 |                   0.5
 C     |      5 |                     1
(3 rows)

Use a common table expression to get a single random number in a loop:

with seed(r) as (select random())
select min(label)
from weights, seed
where r <= cumulative_percentage

or create a function:

create or replace function get_percentageed_reward(r float)
returns text language sql as $$
    select min(label)
    from weights
    where r <= cumulative_percentage
$$; 

Check the results of the alghorithm:

select get_percentageed_reward(random()), count(*)
from generate_series(1, 100000)
group by 1
order by 1

 get_percentageed_reward | count 
-------------------------+-------
 A                       | 20008
 B                       | 30165
 C                       | 49827
(3 rows)    
klin
  • 112,967
  • 15
  • 204
  • 232
  • This was it exactly. The "cumulative" portion was the missing ingredient. So instead of a percentage set of [A: 0.2, B: 0.3, C: 0.5]., it would be [A: 0.2, B: 0.5, C: 1.0]. And we would select the reward closest but higher than the roll. Magic, my good sir, and a well documented answer! – Bryant Makes Programs Oct 19 '19 at 18:06
  • And implemented in my project with fantastic results. Thanks a bunch! – Bryant Makes Programs Oct 19 '19 at 18:19