From the following game
table of sports matches:
+-----+-----------+---------------+----------+-------+-------+---------+---------+
| id_ | date_time | tournament_id | round_id | p1_id | p2_id | p1_stat | p2_stat |
+-----+-----------+---------------+----------+-------+-------+---------+---------+
| 1 | NULL | 1 | 4 | 1 | 3 | 2 | 3 |
| 2 | NULL | 1 | 5 | 1 | 4 | 4 | 6 |
| 3 | NULL | 1 | 9 | 1 | 5 | 6 | 9 |
| 4 | NULL | 1 | 10 | 2 | 1 | 8 | 12 |
| 5 | NULL | 2 | 4 | 1 | 2 | 10 | 15 |
| 6 | NULL | 2 | 5 | 4 | 1 | 12 | 18 |
+-----+-----------+---------------+----------+-------+-------+---------+---------+
I'm trying to get the stats for each player for their previous match. The output should look like this:
+-----+--------------+--------------+
| id_ | prev_p1_stat | prev_p2_stat |
+-----+--------------+--------------+
| 1 | NULL | NULL |
| 2 | 2 | NULL |
| 3 | 4 | NULL |
| 4 | NULL | 6 |
| 5 | 12 | 8 |
| 6 | 6 | 10 |
+-----+--------------+--------------+
However, the date_time
column is quite often blank and id_
is not date sequential. The tournament
table does have a date_time
column which is always populated:
+-----+------------+
| id_ | date_time |
+-----+------------+
| 1 | 1997-01-01 |
| 2 | 1997-01-06 |
+-----+------------+
This means the tournament
date_time
can be used in conjunction with game
round_id
to determine the previous match.
I've found the following answers here and here but they both focus on a single table and don't have the added complexity of having to determine whether the p1_stat
or the p2_stat
should be selected.
I've got as far as this query:
SELECT
g.id_ AS game_id,
CASE
WHEN g.p1_id = sq_p1.p1_id THEN sq_p1.p1_stat
ELSE sq_p1.p2_stat
END AS prev_p1_stat,
CASE
WHEN g.p1_id = sq_p2.p1_id THEN sq_p2.p1_stat
ELSE sq_p2.p2_stat
END AS prev_p2_stat
FROM
test.game AS g
JOIN
test.tournament AS t ON t.id_ = g.tournament_id
LEFT OUTER JOIN
(SELECT
g.id_ AS match_id,
t.date_time AS tournament_date,
g.round_id,
g.p1_id,
g.p2_id,
g.p1_stat,
g.p2_stat
FROM
test.game AS g
JOIN test.tournament AS t ON t.id_ = g.tournament_id) AS sq_p1 ON (sq_p1.p1_id = g.p1_id
OR sq_p1.p2_id = g.p1_id)
AND (sq_p1.tournament_date = t.date_time
AND sq_p1.round_id < g.round_id
OR sq_p1.tournament_date < t.date_time)
LEFT OUTER JOIN
(SELECT
g.id_ AS match_id,
t.date_time AS tournament_date,
g.round_id,
g.p1_id,
g.p2_id,
g.p1_stat,
g.p2_stat
FROM
test.game AS g
JOIN test.tournament AS t ON t.id_ = g.tournament_id) AS sq_p2 ON (sq_p2.p1_id = g.p1_id
OR sq_p2.p2_id = g.p1_id)
AND (sq_p2.tournament_date = t.date_time
AND sq_p2.round_id < g.round_id
OR sq_p2.tournament_date < t.date_time)
ORDER BY t.date_time , g.round_id
But this isn't even close to what I'm looking for :(
I've created a dbfiddle.
One other thing that's perhaps worth mentioning... I intend to use a couple of versions of this query in a union query such that the final result (including all columns for reference) will look like this:
+-----+------------+-----------+-------------+-------------+---------------+------------------+--------------------+
| id_ | player_num | player_id | opponent_id | player_stat | opponent_stat | player_prev_stat | opponent_prev_stat |
+-----+------------+-----------+-------------+-------------+---------------+------------------+--------------------+
| 1 | 1 | 1 | 3 | 2 | 3 | NULL | NULL |
| 1 | 2 | 3 | 1 | 3 | 2 | NULL | NULL |
| 2 | 1 | 1 | 4 | 4 | 6 | 2 | NULL |
| 2 | 2 | 4 | 1 | 6 | 4 | NULL | 2 |
| 3 | 1 | 1 | 5 | 6 | 9 | 4 | NULL |
| 3 | 2 | 5 | 1 | 9 | 6 | NULL | 4 |
| 4 | 1 | 2 | 1 | 8 | 12 | NULL | 6 |
| 4 | 2 | 1 | 2 | 12 | 8 | 6 | NULL |
| 5 | 1 | 1 | 2 | 10 | 15 | 12 | 8 |
| 5 | 2 | 2 | 1 | 15 | 10 | 8 | 12 |
| 6 | 1 | 4 | 1 | 12 | 18 | 6 | 10 |
| 6 | 2 | 1 | 4 | 18 | 12 | 10 | 6 |
+-----+------------+-----------+-------------+-------------+---------------+------------------+--------------------+
Perhaps it makes more sense to do a union and then engineer the previous stats?
For some final info, the actual game
table has about 1.5m rows and the actual tournament
table has about 30k rows. I'm using MySQL 8.0.26.