0

From the following game table of sports matches:

+-----+-----------+---------------+----------+-------+-------+---------+---------+
| id_ | date_time | tournament_id | round_id | p1_id | p2_id | p1_stat | p2_stat |
+-----+-----------+---------------+----------+-------+-------+---------+---------+
|   1 | NULL      |             1 |        4 |     1 |     3 |       2 |       3 |
|   2 | NULL      |             1 |        5 |     1 |     4 |       4 |       6 |
|   3 | NULL      |             1 |        9 |     1 |     5 |       6 |       9 |
|   4 | NULL      |             1 |       10 |     2 |     1 |       8 |      12 |
|   5 | NULL      |             2 |        4 |     1 |     2 |      10 |      15 |
|   6 | NULL      |             2 |        5 |     4 |     1 |      12 |      18 |
+-----+-----------+---------------+----------+-------+-------+---------+---------+

I'm trying to get the stats for each player for their previous match. The output should look like this:

+-----+--------------+--------------+
| id_ | prev_p1_stat | prev_p2_stat |
+-----+--------------+--------------+
|   1 | NULL         | NULL         |
|   2 | 2            | NULL         |
|   3 | 4            | NULL         |
|   4 | NULL         | 6            |
|   5 | 12           | 8            |
|   6 | 6            | 10           |
+-----+--------------+--------------+

However, the date_time column is quite often blank and id_ is not date sequential. The tournament table does have a date_time column which is always populated:

+-----+------------+
| id_ | date_time  |
+-----+------------+
|   1 | 1997-01-01 |
|   2 | 1997-01-06 |
+-----+------------+

This means the tournament date_time can be used in conjunction with game round_id to determine the previous match.

I've found the following answers here and here but they both focus on a single table and don't have the added complexity of having to determine whether the p1_stat or the p2_stat should be selected.

I've got as far as this query:

SELECT 
    g.id_ AS game_id,
    CASE
        WHEN g.p1_id = sq_p1.p1_id THEN sq_p1.p1_stat
        ELSE sq_p1.p2_stat
    END AS prev_p1_stat,
    CASE
        WHEN g.p1_id = sq_p2.p1_id THEN sq_p2.p1_stat
        ELSE sq_p2.p2_stat
    END AS prev_p2_stat
FROM
    test.game AS g
        JOIN
    test.tournament AS t ON t.id_ = g.tournament_id
        LEFT OUTER JOIN
    (SELECT 
        g.id_ AS match_id,
            t.date_time AS tournament_date,
            g.round_id,
            g.p1_id,
            g.p2_id,
            g.p1_stat,
            g.p2_stat
    FROM
        test.game AS g
    JOIN test.tournament AS t ON t.id_ = g.tournament_id) AS sq_p1 ON (sq_p1.p1_id = g.p1_id
        OR sq_p1.p2_id = g.p1_id)
        AND (sq_p1.tournament_date = t.date_time
        AND sq_p1.round_id < g.round_id
        OR sq_p1.tournament_date < t.date_time)
        LEFT OUTER JOIN
    (SELECT 
        g.id_ AS match_id,
            t.date_time AS tournament_date,
            g.round_id,
            g.p1_id,
            g.p2_id,
            g.p1_stat,
            g.p2_stat
    FROM
        test.game AS g
    JOIN test.tournament AS t ON t.id_ = g.tournament_id) AS sq_p2 ON (sq_p2.p1_id = g.p1_id
        OR sq_p2.p2_id = g.p1_id)
        AND (sq_p2.tournament_date = t.date_time
        AND sq_p2.round_id < g.round_id
        OR sq_p2.tournament_date < t.date_time)
ORDER BY t.date_time , g.round_id

But this isn't even close to what I'm looking for :(

I've created a dbfiddle.

One other thing that's perhaps worth mentioning... I intend to use a couple of versions of this query in a union query such that the final result (including all columns for reference) will look like this:

+-----+------------+-----------+-------------+-------------+---------------+------------------+--------------------+
| id_ | player_num | player_id | opponent_id | player_stat | opponent_stat | player_prev_stat | opponent_prev_stat |
+-----+------------+-----------+-------------+-------------+---------------+------------------+--------------------+
|   1 |          1 |         1 |           3 |           2 |             3 | NULL             | NULL               |
|   1 |          2 |         3 |           1 |           3 |             2 | NULL             | NULL               |
|   2 |          1 |         1 |           4 |           4 |             6 | 2                | NULL               |
|   2 |          2 |         4 |           1 |           6 |             4 | NULL             | 2                  |
|   3 |          1 |         1 |           5 |           6 |             9 | 4                | NULL               |
|   3 |          2 |         5 |           1 |           9 |             6 | NULL             | 4                  |
|   4 |          1 |         2 |           1 |           8 |            12 | NULL             | 6                  |
|   4 |          2 |         1 |           2 |          12 |             8 | 6                | NULL               |
|   5 |          1 |         1 |           2 |          10 |            15 | 12               | 8                  |
|   5 |          2 |         2 |           1 |          15 |            10 | 8                | 12                 |
|   6 |          1 |         4 |           1 |          12 |            18 | 6                | 10                 |
|   6 |          2 |         1 |           4 |          18 |            12 | 10               | 6                  |
+-----+------------+-----------+-------------+-------------+---------------+------------------+--------------------+

Perhaps it makes more sense to do a union and then engineer the previous stats?

For some final info, the actual game table has about 1.5m rows and the actual tournament table has about 30k rows. I'm using MySQL 8.0.26.

Jossy
  • 589
  • 2
  • 12
  • 36
  • 1
    Since you're using MySQL 8.x you should be able to use the `LAG()` window function to pair a row with the previous row. – Barmar Dec 13 '21 at 22:27

1 Answers1

0

Kudos to @Barmer for the direction - here's the query I created using LAG():

WITH union_matches AS (
    SELECT 
        g.id_ AS match_id,
        t.date_time AS tournament_date,
        g.round_id AS round_id,
        1 AS player_num,
        g.p1_id AS player_id,
        g.p2_id AS opponent_id,
        g.p1_stat AS player_stat,
        g.p2_stat AS opponent_stat
    FROM
        game AS g
            JOIN
        tournament AS t ON t.id_ = g.tournament_id 
    UNION SELECT 
        g.id_ AS match_id,
        t.date_time AS tournament_date,
        g.round_id AS round_id,
        2 AS player_num,
        g.p2_id AS player_id,
        g.p1_id AS opponent_id,
        g.p2_stat AS player_stat,
        g.p1_stat AS opponent_stat
    FROM
        game AS g
            JOIN
        tournament AS t ON t.id_ = g.tournament_id
)
SELECT 
    match_id,
    player_num,
    player_id,
    opponent_id,
    player_stat,
    opponent_stat,
    LAG(player_stat, 1) OVER (PARTITION BY player_id ORDER BY tournament_date, round_id) AS wrong_player_prev_stat,
    LAG(opponent_stat, 1) OVER (PARTITION BY opponent_id ORDER BY tournament_date, round_id) AS wrong_opponent_prev_stat
FROM
    union_matches
ORDER BY
    tournament_date, round_id, player_num

And a link to the dbfiddle.

Jossy
  • 589
  • 2
  • 12
  • 36