My goal is to create a query that will return a count of unique customers who have purchased in a 365 day window. I created the query below in postgres and and the resulting queries are very slow. My table is 812,024 rows of just order dates and customer id's. When I remove the distinct statement, I can get the query to return a result in approx 60 seconds, with it, I have yet to finish. I created an index on (order_date, id). I am a complete newb to SQL, this is really the first time i have ever done anything with it, and after trying to find a solution to this problem all day, I could not find anything that I could get to work, even though I have seen a lot about the slow performace of distinct.
SELECT
(d1.Ordered) AS Ordered,
COUNT(distinct d2.ID) Users
FROM
(
SELECT order_date AS Ordered
FROM orders
GROUP BY order_date
) d1
INNER JOIN
(
SELECT order_date AS Ordered, id
FROM orders
) d2
ON d2.Ordered BETWEEN d1.Ordered - 364 AND d1.Ordered
GROUP BY d1.Ordered
ORDER BY d1.Ordered
"Sort (cost=3541596.30..3541596.80 rows=200 width=29)"
" Sort Key: orders_1.order_date"
" -> HashAggregate (cost=3541586.66..3541588.66 rows=200 width=29)"
" -> Nested Loop (cost=16121.73..3040838.52 rows=100149627 width=29)"
" -> HashAggregate (cost=16121.30..16132.40 rows=1110 width=4)"
" -> Seq Scan on orders orders_1 (cost=0.00..14091.24 rows=812024 width=4)"
" -> Index Only Scan using x on orders (cost=0.43..1822.70 rows=90225 width=29)"
" Index Cond: ((order_date >= (orders_1.order_date - 364)) AND (order_date <= orders_1.order_date))"