Query Optimization
I need help on optimizing the performance of this query. This query basically looks for the accumulative sum of all the periods that matches the list of case when conditions.
Currently, this query takes about 100 secs to run since it groups by every accounts in the database. I tried to optimize it by looking at the explain output but I couldn't figure out a way to make it work. Here is the explain output:
The idea time would be 10 sec or less. Looking forward to your response. Thank you!
SET @date = '2017-05-17';
SET @offset = 1;
select
b.act,
CASE
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 5 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 5 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=5 THEN 5
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 13 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 13 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=13 THEN 13
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 25 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 25 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=25 THEN 25
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 45 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 45 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=45 THEN 45
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 75 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 75 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=75 THEN 75
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 105 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 105 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=105 THEN 105
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 135 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 135 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=135 THEN 135
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 165 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 165 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=165 THEN 165
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 195 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 195 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=195 THEN 195
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 225 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 225 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=225 THEN 225
ELSE 'other' END AS 'period',
SUM(CASE WHEN a.type = 'JN' AND a.paid = 'Y' AND a.upgraded=0 THEN 1 ELSE 0 END) AS 'Paid_Joins',
SUM(CASE WHEN a.type IN ('SL','RL') AND ttype !='Purchase' THEN (a.amt_usd/100 - a.vat_usd/100) END) AS 'Revenue_Amount'
FROM __customer b
JOIN __transaction a on b.uid = a.primary_uid
WHERE
b.affiliate_act regexp '^[a-zA-Z]+[0-9]+'
AND a.dt <= @date
AND a.dt >= DATE_SUB(@date, INTERVAL 225 + @offset DAY)
AND b.jdt >= DATE_SUB(@date, INTERVAL 225 + @offset DAY)
GROUP BY 1,2
HAVING period != 'other'
UPDATE
Table structure:
UPDATE2
I queried on the transaction table with the same query logic without the customer table join, and it looks like it still scanning the same row as the join. Since it looks through every combination in the database, I can't of thinking adding more effective where clauses to limit the number of rows scanned.
SET @date = '2017-05-17';
SET @offset = 2;
SET @start = DATE_SUB(@date, INTERVAL 225 + @offset DAY);
explain
select
a.account,
SUM(CASE WHEN a.type = 'JN' AND a.paid = 'Y' AND a.upgraded=0
THEN 1
ELSE 0
END) AS 'Paid_Joins'
FROM __transaction a
WHERE a.account regexp '^[a-zA-Z]+[0-9]+'
AND a.dt <= @date
AND a.dt >= @start
-- AND b.affiliate_act = 'el4557'
GROUP BY 1
The number of rows scanned here is the same as if there is a join.