2

I have a system where users can earn 1 or more credits for different types of contributions. These are stored within 2 tables:

CREATE TABLE user_contribution_types (
  type_id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  title VARCHAR(255) NOT NULL,
  credits DECIMAL(5,2) UNSIGNED NOT NULL,
  valid TINYINT(1) UNSIGNED NOT NULL DEFAULT 1,

  PRIMARY KEY (type_id)
);

CREATE TABLE user_contributions (
  user_id INTEGER UNSIGNED NOT NULL,
  type_id INTEGER UNSIGNED NOT NULL,
  create_date DATETIME NOT NULL,
  valid TINYINT(1) UNSIGNED NOT NULL DEFAULT 1,

  FOREIGN KEY (user_id)
    REFERENCES users(user_id),
  FOREIGN KEY (type_id)
    REFERENCES user_contribution_types(type_id)
);

I can select the total credits earned since a specific date with the following:

SELECT SUM(credits) AS total
FROM   user_contribution_types AS a
JOIN   user_contributions AS b ON a.type_id = b.type_id
WHERE  b.create_date >= '2017-05-01 00:00:00'
       AND a.valid = TRUE
       AND b.valid = TRUE

Likewise, I could include a match for b.user_id to find the total credits for that specific user.

What I would like to do is treat each credit earned as an entry into a giveaway, and select 3 random (unique) user_ids from the total. So if one user has earned 26 credits, they will have 26 chances to win.

How can this be done with SQL, or would it make more sense to do it at the application level? I would prefer a solution that is as close to truly random as possible.

mister martin
  • 6,197
  • 4
  • 30
  • 63

2 Answers2

2

You can select one user by calculating the cumulative distribution and using rand():

SELECT uc.*
FROM (SELECT uc.user_id, (@t := @t + total) as running_total
      FROM (SELECT uc.user_id, SUM(credits) as total
            FROM user_contribution_types ct JOIN
                 user_contributions c
                 ON ct.type_id = c.type_id
            WHERE c.create_date >= '2017-05-01' AND ct.valid = TRUE AND c.valid = TRUE
            GROUP BY uc.user_id
           ) uc CROSS JOIN
           (SELECT @t := 0) params
      ORDER BY rand()
     ) uc
WHERE rand()*@t BETWEEN (running_total - total) AND running_total;

There is a minuscule chance that this will return two values, if rand() is exactly on the boundary. For your purposes, this is not an issue; you can just add limit 1.

To extend this to multiple rows, you can simply modify the WHERE clause to:

WHERE rand()*@t BETWEEN (running_total - total) AND running_total OR
      rand()*@t BETWEEN (running_total - total) AND running_total OR
      rand()*@t BETWEEN (running_total - total) AND running_total

The problem is that all the resulting values might be the same results.

You can randomly choose more than three values. My inclination would be to choose a larger number, such as 9:

WHERE 0.1*@t BETWEEN (running_total - total) AND running_total OR
      0.2*@t BETWEEN (running_total - total) AND running_total OR
      0.3*@t BETWEEN (running_total - total) AND running_total OR
      . . .
ORDER BY rand()  -- redundant, but why not?
LIMIT 3

Or more simply:

WHERE FLOOR( 10*(running_total - total)/@t)) <> FLOOR( 10*running_total/@t)
ORDER BY rand()
LIMIT 3

This is easier because you can change the 10 and test any number of equally spaced points along the cumulative distribution.

mister martin
  • 6,197
  • 4
  • 30
  • 63
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • If I understand correctly, your first example could be used in combination with application level logic? So I could wrap it into a function and pass the results back to itself? Ie, `WHERE user_id NOT IN ($winners)`. And that would completely rule out duplicate results... – mister martin May 09 '17 at 18:13
  • Also I'm getting this error: Unknown column 'uc.user_id' in 'field list' – mister martin May 09 '17 at 19:02
  • @mistermartin . . . That column is specified in your question. You might have been missing `u.user_id`, but I fixed the aliases. – Gordon Linoff May 09 '17 at 23:28
  • Yes, I'm changing your example `userId` and `userid` to match my example, `user_id`. It is still generating the same error. – mister martin May 10 '17 at 01:49
  • I would love to accept this answer, but I cannot get past the error mentioned. I'm executing this within phpmyadmin (4.0.10deb1): Unknown column 'uc.user_id' in 'field list', even though the column does in fact exist. – mister martin May 10 '17 at 08:42
0

Well I couldn't get Gordon's code to run without error, so I wound up reverting back to application logic and followed the solution found here. Example:

// pick a random winner since a given date
// optionally exclude certain users
public function getWinner($date, array $exclude = []) {
    if (!empty($exclude)) {
        $in = implode(',', array_fill(0, count($exclude), '?'));
        array_unshift($exclude, $date);

        $sql = "SELECT   b.user_id, SUM(credits) AS total
                FROM     user_contribution_types AS a
                JOIN     user_contributions AS b ON a.type_id = b.type_id
                WHERE    b.create_date >= ?
                         AND b.user_id NOT IN ($in)
                         AND a.valid = TRUE
                         AND b.valid = TRUE
                GROUP BY b.user_id";
        $sth = $this->db->prepare($sql);
        $sth->execute($exclude);
    } else {
        $sql = "SELECT   b.user_id, SUM(credits) AS total
                FROM     user_contribution_types AS a
                JOIN     user_contributions AS b ON a.type_id = b.type_id
                WHERE    b.create_date >= :date
                         AND a.valid = TRUE
                         AND b.valid = TRUE
                GROUP BY b.user_id";
        $sth = $this->db->prepare($sql);
        $sth->execute([':date' => $date]);
    }

    $result = [];
    while ($row = $sth->fetch(PDO::FETCH_ASSOC)) {
        $result[$row['user_id']] = floor($row['total']);
    }

    // cryptographically secure pseudo-random integer, otherwise fallback
    $total = array_sum($result);
    if (function_exists('random_int')) {
        $rand = $total > 0 ? random_int(0, $total - 1) : 0;
    } else {
        // fallback, NOT cryptographically secure
        $rand = $total > 0 ? mt_rand(0, $total - 1) : 0;
    }

    $running_total = 0;
    foreach ($result as $user_id => $credits) {
        $running_total += $credits;
        if ($running_total > $rand) {
            // we have a winner
            return $user_id;
        }
    }

    return false;
}

So I can basically execute this code as many times as I want to select multiple winners:

$ts = '2017-01-01 00:00:00';
$first_place = getWinner($ts);
$second_place = getWinner($ts, [$first_place]);
$third_place = getWinner($ts, [$first_place, $second_place]);

Unless an alternative solution is posted I will accept this as the answer.

Community
  • 1
  • 1
mister martin
  • 6,197
  • 4
  • 30
  • 63