1

I need a SQL query to determine a random winner. Each user has it's own winning odds. The more the winning_odds value is high, the more the user has chances to win. Here's a look at the table structure:

id    email            winning_odds
1     test@test.com    3
2     test2@test.com   5
3     test3@test.com   2
4     test4@test.com   1
5     test5@test.com   9

MySQL database. Table has approximately 100000 rows. There's only one winner, one time. Emails are unique. Anyone has a solution?

Thanks.

Leo Stein
  • 159
  • 1
  • 3
  • 17

4 Answers4

1

Select email from user order by winning_odds*rand() limit 1

histocrat
  • 2,291
  • 12
  • 21
  • I can't figure out if this do the trick. I don't think so, is it? – Leo Stein Nov 07 '12 at 20:36
  • Whether it works properly depends on how, quantitatively, you want the winning_odds to affect the percentage chance of each user winning. If all you care about is that users with higher winning_odds have higher odds of winning, then yes, this works. – histocrat Nov 07 '12 at 20:38
  • 5*rand() is 5 to 9 as likely to be higher than 9*Rand(). It's simple and elegant and most importantly works. – jTC Nov 07 '12 at 20:53
  • That's not actually true; it's somewhere less than 5 to 9. I haven't worked out the formula, although I'm sure someone has. But consider the simplest case: two users with winning_odds of 1 and 2, respectively. 1:2 odds would mean the second user wins 2/3 of the time. But given a random number between 0 and 1, and another between 0 and 2, half the time the second will be greater than 1, and win automatically. The other half of the time, both numbers are between 0 and 1, so the second number wins half the time. So the second number wins 3/4 of the time. – histocrat Nov 07 '12 at 21:04
1

I really liked this question, I'm posting the answer for postgresql.

select 
    *, generate_series(1, mytable.winning_odds) 
from 
    mytable 
order by 
    random() 
limit 1;

This is how it works. For each row of your table, we replicate the row N times as your winning odds.

So you get at first and without limiting the query results:

  5 | test5@test.com |    9 | 9
  2 | test2@test.com |    5 | 3
  3 | test3@test.com |    2 | 1
  1 | test@test.com  |    3 | 1
  5 | test5@test.com |    9 | 5
  1 | test@test.com  |    3 | 3
  5 | test5@test.com |    9 | 2
  2 | test2@test.com |    5 | 4
  2 | test2@test.com |    5 | 5
  5 | test5@test.com |    9 | 1
  4 | test4@test.com |    1 | 1
  5 | test5@test.com |    9 | 7
  5 | test5@test.com |    9 | 4
  5 | test5@test.com |    9 | 6
  2 | test2@test.com |    5 | 1
  5 | test5@test.com |    9 | 8
  3 | test3@test.com |    2 | 2
  1 | test@test.com  |    3 | 2
  2 | test2@test.com |    5 | 2
  5 | test5@test.com |    9 | 3

Now, selecting randomly any row of the generated table will reflect the probabilities of your winning_odds field.

All you have to do is to order it randomly and get the first record.

 9 | test5@test.com |    9 | 2 

Regards

andrefsp
  • 3,580
  • 2
  • 26
  • 30
  • Well thanks, this seems to do the trick. Too bad that MySQL doesn't have the generate_series feature. I guess I'll have to create another table that simulate it. – Leo Stein Nov 07 '12 at 21:00
  • Have a look on this post http://stackoverflow.com/questions/6870499/generate-series-equivalent-in-mysql It contains how to do the generate_series like in mySQL – andrefsp Nov 07 '12 at 22:21
1

I am speculating that the "odds" are not integers and that you want something that has a "9" to be nine times more likely than a "1".

The proper way to do this is with a cumulative sum. Then generate a random value between the min and max of the cumulative sum and choose the record that is in that range. The following query does this in MySQL:

select t.*
from (select t.*,
             coalesce((select sum(odds) from t t2 where t2.id < t.id), 0) as cumsum,
             const.sumodds
      from t cross join
           (select rand()*sum(odds) as val from t) const
     ) t
where val between cumsum and cumsum + t.odds

However, this is doing a non-equijoin and would probably be prohibitively expensive in MySQL. Other databases have the ability to do a cumulative sum in a single query. MySQL does not have an efficient way of doing this.

How to optimize the query depends on certain other factors in the problem. How many different values do "odds" take on? Can you use temporary tables?

I don't have the time right now to write out the solution, but there is a more efficient way. YThe idea is to split the problem into two searches. The first will find which "odds" value wins. The second will find which row wins.

Here are the details:

(1) Summarize the data into a table by the odds. This table would have 11 rows, and contain the "odds" and the "count" for each.

(2) Calculate the sum of "count*odds" for each row, starting at 0 for the first row. You can use the above query as a guide, since this is such a small amount of data it will run quickly.

(3) Calculate a random number as rand()*<sum of all odds>. Now, locate the odds where the number is between cumsum an cumsum+odds.

(4) Now return to the original table and issue a query such as:

select *
from t
where odds = <winning odds>
order by rand()
limit 1
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for the answer, the winning_odds values are all integers. Now in my table, the highest value is 11 but it could be higher. Yes I can use temporary tables. – Leo Stein Nov 07 '12 at 21:07
0

If I understand the question correctly, you are asking how to select a random record from the table. This should work:

SELECT * 
FROM   tableName
ORDER BY RAND() LIMIT 0,1;

Still now clear how are you planning to user the winning_odds value.

Vaibhav Desai
  • 2,618
  • 1
  • 16
  • 16