SQL aggregate unique pairs

Question

I have a PostgreSQL table that is mostly a bridge table but it also has some extra stuff.

Essentially it holds the information about players in a game. So we have a unique id for this instance of a player in a game. Then an id that is FK to game table, and an id that is FK to player table. There is also some other irrelevant stuff. Something like this:

Table players_games
| id        | 12564
| player_id | 556
| game_id   | 156184

What I want to do is find how many occurrences there are of a player playing with another. So, if player1 is in the same game as player2, they have played together once. There are 2+ players in a game.

So what I want to do is populate a new table, that holds three values: player_lo, player_hi, times_played.

And either have one row for each pair and the number of times they played, or if it ends up being more efficient, a row for each iteration and have the value set as 1 so these can be added together later, maybe distributed. So you might see something like:

p1, p2, 1
p1, p2, 1

And these get reduced later to:

p1, p2, 2

So I was wondering if there was some clever way to do this with SQL, or if there's SQL that can reduce my programming effort, before starting to write a slightly complex python script to do it.

score 3 · Answer 1 · answered May 20 '12 at 13:00

3

select p1, p2, count(*) from (
    select 
      pg1.player_id as p1, pg1.game_id, pg2.player_id as p2
    from
      players_games pg1, players_games pg2
    where
      pg1.game_id = pg2.game_id and pg1.player_id != pg2.player_id
) foo
group by p1, p2

Note that this does a full join on players_games so it can be very slow if the table is large. The key part is the group by for getting the count.

answered May 20 '12 at 13:00

beerbajay

19,652
6
58
75

Would you consider 100,000,000 rows 'large' in this case? – Tom Carrick May 20 '12 at 13:17
1

I would change the `!=` to a `<` (or `>`), Otherwise you will dubble-count all the pairs. – wildplasser May 20 '12 at 14:02
@wildplasser; you mean that you'd have both `p1, p2` and `p2, p1` ? Isn't that the point? – beerbajay May 20 '12 at 14:14
Yes. I don't know the exact intention of the OP (it is rather unclear to me ...) Could be either way ... – wildplasser May 20 '12 at 14:29
1

@wildplasser: I am pretty sure, your comment about `>` instead of `!=` is accurate. The way it is, it will list `(p1, p2)` twice (in reversed order, with identical count). Other than that, this is a special case of *relational division*. We collected quite a bit on that [in this related question](http://stackoverflow.com/q/7364969/939860). – Erwin Brandstetter May 21 '12 at 16:49
The OP was not very clear. (a bit vague, actually) In most cases you want unique pairs, and not both mirror-images. Aah! your skin a cat topic. See also my "how to massage the optimiser" breakdown from last saturday (shameless plug ;-) – wildplasser May 21 '12 at 16:53

Gordon Linoff · Accepted Answer · 2012-05-20T15:26:04.263

To do this, you need to do a self join on the player_games table. The first subquery is for the first player, and the second for the second player. The "first" player is the one with the lower player id.

select pg1.player_id as player1, pg2.player_id as player2, count(*) as num_games
from (select distinct game_id, player_id
      from  players_games pg
     ) pg1 join
     (select distinct game_id, player_id
      from players_games pg
     ) pg2
     on pg1.game_id = pg2.game_id and
        pg1.player_id < pg2.player_id
group by pg1.player_id, pg2.player_id

Note that the join condition uses a "<" on the player ids. This is to prevent counting duplicates (so players A,B are not also counted as B,A).

Also, I added a "distinct" in the inner subqueries just in case a single player might appear more than once for a given game. Perhaps this is not necessary. To be sure, you should have a unique index on the composite key game_id, player_id.

Seemingly something up with your syntax there: ERROR: missing FROM-clause entry for table "p1" — Tom Carrick, May 20 '12 at 13:59

wildplasser · Answer 3 · 2012-05-30T21:24:28.277

SET search_path='tmp';
DROP TABLE players_game CASCADE;
CREATE TABLE players_game
        ( game_id INTEGER NOT NULL
        , player_id INTEGER NOT NULL
        );
INSERT INTO players_game(game_id,player_id) VALUES
 (1,100) ,(1,101) ,(2,100) ,(2,101)
,(3,100) ,(3,101) ,(4,102) ,(4,101)
        ;

WITH pair AS (
    SELECT g1.player_id AS p1
     , g2.player_id AS p2
    FROM players_game g1
    JOIN players_game g2 ON g1.game_id = g2.game_id
    WHERE g1.player_id < g2.player_id
    )
SELECT pa.p1 , pa.p2, COUNT(*) AS num_games
FROM pair pa
GROUP BY p1, p2
ORDER BY num_games DESC
        ;

Result:

SET
ERROR:  table "players_game" does not exist
CREATE TABLE
INSERT 0 8
 p1  | p2  | num_games 
-----+-----+-----------
 100 | 101 |         3
 101 | 100 |         3
 102 | 101 |         1
 101 | 102 |         1
(4 rows)

SQL aggregate unique pairs

3 Answers3