Table A: (columns: id (integer), name (varchar), ...)
Table B: (columns: id (integer), a_id (integer), value (numeric), ...)
Table C: (columns: id (integer), a_id (integer), b_id (integer), date (date), ...)
SELECT A.name, SUM(B.value) AS total_value,
COUNT(DISTINCT C.date) AS distinct_dates
FROM A
JOIN B ON A.id = B.a_id
JOIN C ON B.id = C.b_id
WHERE C.date BETWEEN '2022-01-01' AND '2022-12-31'
GROUP BY A.name
ORDER BY total_value DESC;
The query takes a considerable amount of time to execute, even when limited to a specific date range. Indexes are in place on the relevant columns.
Table A has approximately 10 million rows.
Table B has approximately 20 million rows.
Table C has approximately 50 million rows.
The query execution time increases exponentially with larger date ranges.
What are specific indexes, query rewrites, or configuration changes to improve the execution time?
What are alternative approaches for handling such a complex join and aggregation operation on a large dataset?