About tableA
- On MySQL 5.7.38 (AWS RDS)
- Contains +6.000.000 rows
- Property
last_activity_at
is set as anINDEX
and hasdatetime
format - Property
project_uid
is set as anINDEX
and is part of a Foreign Key
I'm performing the following SQL query
SELECT project_uid, COUNT(uid) AS "count"
FROM tableA
WHERE last_activity_at >= "2023-05-01 00:00:00"
AND last_activity_at <= "2023-05-31 23:59:59"
GROUP BY project_uid
Here is the result (only performance)
117 rows in set (5 min 49,69 sec)
Here is the EXPLAIN
results of the query above
+----+-------------+--------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | tableA | NULL | index | IDX_2dd9918084b7ec339b29123b7e,IDX_acd46110be997a5678aa809f1c,FK_c0bc0d43ba64ca68b4fc7b7a095,IDX_8d2721775c4645ba9c620ec9fa | FK_c0bc0d43ba64ca68b4fc7b7a095 | 147 | NULL | 6235932 | 50.00 | Using where |
+----+-------------+--------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+---------+----------+-------------+
Here is the result without group by
(which is still a lot for a simple where query)
1376493 rows in set (31,74 sec)
Here is the EXPLAIN
of the query without GROUP BY
+----+-------------+--------+------------+------+--------------------------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+--------------------------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | tableA | NULL | ALL | IDX_8d2721775c4645ba9c620ec9fa | NULL | NULL | NULL | 6237888 | 50.00 | Using where |
+----+-------------+--------+------------+------+--------------------------------+------+---------+------+---------+----------+-------------+
I'm trying to understand two things here :
- Why adding a
group by
impact performances that much even with awhere
clause that is supposed to reduce the number of rows to group ? - Why the query without the
group by
is still taking 30 sec on a 6M rows table ? Maybe it's just me but 6M is not that much for a database ?
Thanks for your help. If you have any optimization recommandation I'll be glad to read it!