I have a table with about 13 million rows. Each row represents a measurement of a certain type for a particular project for a particular hour of a particular day.
I have a query that finds sums or averages of these values depending on the measurement type. It's very slow, like a few minutes.
We have certain reporting pages that utilize the results of this query, but it's not acceptable for the pages to take multiple minutes to load. My solution so far has been to cache the results of the query in what I'm calling a summary table.
The problem is that the nightly-run script that refreshes the summary table takes too long to run. I'm not even trying to refresh the whole summary table at once but it still takes too long. (By "too long" I mean errors are raised and the refresh job is not completed.)
I have a hunch that the challenges I'm facing are a result of going about things in the wrong way, and that the solution is maybe not to tweak a few things to shave off the query run time by 1% here and there but to approach the problem in a completely different way.
Any suggestions would be appreciated. I apologize if I'm not asking this question in a very good way; I'm not sure how to formulate it better. Happy to provide clarification or more details.
Here's a simplified version of the query that takes forever to run. (Even this simplified version takes quite a long time.)
select date(calc_dt),
project_id,
calculation_type_cd,
sum(result)
from calc_calculation_results
group by date(calc_dt),
project_id,
calculation_type_cd
The nightly job is basically a SELECT INTO
that takes the results of this query and puts them into my summary table. The result
column is the value we're interested in for the purpose of the reports.