SQL query to get random unused combination

Question

Background:

I want to create a database that can run a tournament of 1 vs 1 matchups. It needs to keep track of who won and lost each matchup and any comments about that matchup as well as decide the next unique matchup randomly.

Rules:

There are x number of players. Each player will eventually play every other player once, in effect covering all possible unique combinations of players.

Database Tables (with Sample data):

DECLARE @Players TABLE (
    ID INT PRIMARY KEY IDENTITY,
    Name VARCHAR(50)
)

ID Name  
-- ----- 
1  Alex  
2  Bob   
3  Chris 
4  Dave 

DECLARE @Matches TABLE (
    ID INT PRIMARY KEY IDENTITY,
    WinnerId INT,
    LoserId INT
)

ID WinnerId LoserId 
-- -------- ------- 
1  1        2       
2  4        2       
3  3        1    

DECLARE @Comments TABLE (
    ID INT PRIMARY KEY IDENTITY,
    MatchId INT,
    Comment VARCHAR(MAX)
)

ID MatchId Comment                        
-- ------- ------------------------------ 
1  2       That was a close one.          
2  3       I did not expect that outcome.

Problem:

How can I efficiently query to get a single random match up that has not yet occurred?

The major problem is that the number of player can and will grow over time. Right now in my example data I only have 4 players which leaves 6 possible matches.

Alex,Bob
Alex,Chris
Alex,Dave
Bob,Chris
Bob,Dave
Chris,Dave

That would be small enough to simply keep grabbing 2 random numbers that correspond to the Player's id and then check the matchups table if that matchup has already occurred. If it has: get 2 more and repeat the process. If it hasn't then use it as the next matchup. However if I have 10,000 players that would be 49995000 possible matchups and it would simply become too slow.

Can anyone point me in the right direction for a more efficient query? I am open to changes in the database design if that would help make things more efficient as well.

score 1 · Accepted Answer · edited May 23 '17 at 11:55

1

If you make an outer join between every possible pairing and those that have been played, then filter out the ones that have been played, you're left with pairings that have not yet been played. Selecting a random one is then a trivial case of ordering:

SELECT p1.Name, p2.Name FROM
  Players p1
  JOIN Players p2 ON (
    p1.ID < p2.ID
  )
  LEFT JOIN Matches ON (
       (WinnerId = p1.ID AND LoserId = p2.ID)
    OR (WinnerId = p2.ID AND LoserId = p1.ID)
  )
WHERE Matches.ID IS NULL
ORDER BY RAND()
LIMIT 1;

EDIT

As noted by ypercube below, the above LIMIT syntax is MySQL specific. You may need to use instead the appropriate syntax for your SQL implementation - let us know what it is and someone can advise, if required. I know that in Microsoft SQL Server one uses TOP and in Oracle ROWNUM, but otherwise your Googling is probably as good as mine. :)

edited May 23 '17 at 11:55

Community

1
1

answered Apr 22 '12 at 21:46

eggyal

122,705
18
212
237

Better version of the answer I gave. Deleting mine and upvoting yours. – JohnFx Apr 22 '12 at 21:47
`LIMIT`? The question is not tagged MySQL. – ypercubeᵀᴹ Apr 22 '12 at 22:17
@ypercube: Good spot. I've updated to incorporate that point. – eggyal Apr 22 '12 at 22:25
I haven't decided on a SQL implementation yet so your MySQL example is fine. If I go with something else I'll just google the equivalent like you suggested. – JMcCon Apr 23 '12 at 01:18

David Z. · Answer 2 · 2012-04-22T21:55:21.043

0

Although the data set is large, using the limit key will stop additional processing as soon as a single key is returned. One possibility might be to user a query like below to return the next match.

SELECT * FROM Players p1, Players p2 WHERE p1.ID <> p2.ID AND (p1.ID, p2.ID) NOT IN (Select WinnerID, LoserID FROM Matches) AND (p2.ID, p1.ID) NOT IN (Select WinnerID, LoserID FROM Matches) LIMIT 1

edited Apr 22 '12 at 21:55

answered Apr 22 '12 at 21:45

David Z.

5,621
2
20
13

score 0 · Answer 3 · answered Apr 22 '12 at 21:50

0

I am wondering why you need to pick 2 players in random. How about generate the whole list of possible matches up front, but then add a WinnerId column? For the next match, just pick the first row which has no WinnerId set.

answered Apr 22 '12 at 21:50

NaN

598
3
15

score 0 · Answer 4 · edited May 23 '17 at 10:34

For your problem, you want A) to consider all 2-element subsets of players B) in a randomized order.

For A, other answers are suggesting using SQL joins with various conditions. A less database-intensive solution if you really need to handle 10,000 players might be to use an efficient combination generating algorithm. I found a previous answer listing some from TAOCP vol. 4 here. For the 2 element subset case, a simple double nested loop over the player ids in lexicographical sequence would be fine:

for player_a in 1..num_players:
  for player_b in player_a+1..num_players:
    handle a vs. b

For part B, you could use a second table mapping players 1..n to a shuffling of the integers 1..n. Keep this shuffled mapping around until you're done the tournament process. You can use the Knuth-Fisher-Yates shuffle.

To keep track of where you are in a instance of this problem, you'll probably want to be saving the combination generator's state to the database regularly. This would probably be faster than figuring out where you are in the sequence from the original tables alone.

As you mention, handling 10,000 players in matchups this way results in nearly fifty million matchups to handle. You might consider a tournament structure that doesn't require every player to compete against each other player. For example, if A beats B and B beats C, then you might not have to consider whether A beats C. If applicable in your scenario, that sort of shortcut could save a lot of time.

SQL query to get random unused combination

4 Answers4