0

I have a single table with some simple number i'd like to sum across different date ranges. Example:

date     | cost | storeId | stateId |
-------------------------------------
1/1/2021 | 100  | 1       |  1      |
1/1/2020 | 30   | 2       |  2      |

I would like to sum the costs across different timespans, so sum of the costs of records within 1 month of now, 2 months of now, etc. I also have several conditions on which to count that require JOINs. This led me to creating a massive query that reuses the same joins and where clause (except the date range) and UNIONs:

SELECT SUM(cost)
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.storeId
LEFT JOIN table3 t3 ON t3.id = t1.stateId
WHERE t1.date > '1monthAgo' t2.otherId = 1 AND t2.anotherId = 2 AND t3.someId = 1
UNION
SELECT SUM(cost)
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.storeId
LEFT JOIN table3 t3 ON t3.id = t1.stateId
WHERE t1.date > '2monthsAgo' t2.otherId = 1 AND t2.anotherId = 2 AND t3.someId = 1
UNION
SELECT SUM(cost)
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.storeId
LEFT JOIN table3 t3 ON t3.id = t1.stateId
WHERE t1.date > '3monthsAgo' t2.otherId = 1 AND t2.anotherId = 2 AND t3.someId = 1

In my real query, these joins and where clauses are much more complicated and make this query not just a mess, but very slow (~4 seconds). I tried re-writing this with CASE statements, and now have:

SELECT 
SUM(CASE t1.date > '1monthAgo' THEN t1.cost ELSE 0 END),
SUM(CASE t1.date > '2monthsAgo' THEN t1.cost ELSE 0 END),
SUM(CASE t1.date > '3monthsAgo' THEN t1.cost ELSE 0 END),
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.storeId
LEFT JOIN table3 t3 ON t3.id = t1.stateId
WHERE t2.otherId = 1 AND t2.anotherId = 2 AND t3.someId = 1

This presents the data slightly differently (all in one row) but it runs much faster (1 second) and doesn't duplicate the JOINs and WHERE clauses. However my real query has 4 different sums across 6 different date ranges, so there are 24 CASE statements and I do not think this is the right way to do this.

Is it possible to have a single query that would share the JOINs and WHERE clauses but allow me to query across different date ranges to gather sums?

Dale K
  • 25,246
  • 15
  • 42
  • 71
Matt
  • 47
  • 1
  • 10
  • 1
    FYI `case` is an *expression* not a *statement*. Based on your description using case expressions is the best way to go. There might be ways to further simplify things, but could only suggest after seeing the actual query. – Dale K Jun 30 '21 at 23:06
  • Is your date indexed? Have you tried to include the cost column directly in the date index? It should be almost instant, even with the union version. Take a look at : https://stackoverflow.com/questions/1307990/why-use-the-include-clause-when-creating-an-index – Guillaume F. Jun 30 '21 at 23:09

1 Answers1

2

One option is to group by EOMONTH which gives you a single date per month, then you have each month in a separate row:

SELECT 
  MonthEnding = EOMONTH(t1.date),
  TotalCost   = SUM(t1.cost)
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.storeId
LEFT JOIN table3 t3 ON t3.id = t1.stateId
WHERE t2.otherId = 1 AND t2.anotherId = 2 AND t3.someId = 1
GROUP BY EOMONTH(t1.date);

If these "months" are actually not aligned with calendar months then you can join on a VALUES table with month numbers

SELECT 
  v.MonthsAgo,
  TotalCost   = SUM(t1.cost)
FROM table1 t1
LEFT JOIN table2 t2 ON t2.id = t1.storeId
LEFT JOIN table3 t3 ON t3.id = t1.stateId
JOIN (VALUES
  (CAST('20210521' AS date), CAST('20210621' AS date), 1),
  (CAST('20210421' AS date), CAST('20210521' AS date), 2),
  (CAST('20210321' AS date), CAST('20210421' AS date), 3)
) v(StartDate, EndDate, MonthsAgo)
  ON t1.date >= v.StartDate AND t1.date < v.EndDate
WHERE t2.otherId = 1 AND t2.anotherId = 2 AND t3.someId = 1
GROUP BY v.MonthsAgo;
Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • That works perfectly, thank you! Runs in about 2 seconds, though looking at the query execution plan i'm missing some indexes on joined tables so that should speed it up a bit – Matt Jul 01 '21 at 00:58