Heap Analytics has a nice blog post about lateral joins for doing something pretty similar. It might give you some ideas. Your situation is actually simpler than theirs though, so your solution is easier too.
First a few notes. You don't seem to need the day
output, since it will always equal your inputs. Second, no matter what, you will need a separate output column for each day (or accumulate results in an array, which seems less desirable), so if you want a variable number of days, you'll have to dynamically build the SQL for that.
For testing I made a table and gave it a few rows:
create table messages (user_id integer, created_at timestamp);
insert into messages values (1, now() - interval '5 days'), (1, now() - interval '4 days'), (1, now() - interval '2 days');
insert into messages values (2, now() - interval '10 days'), (2, now() - interval '2 days');
insert into messages values (3, now() - interval '2 days'), (3, now() - interval '1 days');
insert into messages values (4, now() - interval '5 days');
I think you can get a very clean solution using lateral joins, kind of like the article above:
\set start_time '''2016-06-23 06:00:00'''
WITH t(s) AS (
SELECT :start_time::timestamp
)
SELECT COUNT(DISTINCT m1.user_id) AS day_5_messages,
COUNT(DISTINCT m2.user_id) AS day_4_messages,
COUNT(DISTINCT m3.user_id) AS day_3_messages,
COUNT(DISTINCT m4.user_id) AS day_2_messages,
COUNT(DISTINCT m5.user_id) AS day_1_messages
FROM messages m1
CROSS JOIN t
LEFT OUTER JOIN LATERAL (
SELECT * FROM messages msub
WHERE msub.user_id = m1.user_id
AND msub.created_at <@
tsrange(t.s + interval '1 day',
t.s + interval '2 days')
LIMIT 1
) m2
ON true
LEFT OUTER JOIN LATERAL (
SELECT * FROM messages msub
WHERE msub.user_id = m2.user_id
AND msub.created_at <@
tsrange(t.s + interval '2 days',
t.s + interval '3 days')
LIMIT 1
) m3
ON true
LEFT OUTER JOIN LATERAL (
SELECT * FROM messages msub
WHERE msub.user_id = m3.user_id
AND msub.created_at <@
tsrange(t.s + interval '3 days',
t.s + interval '4 days')
LIMIT 1
) m4
ON true
LEFT OUTER JOIN LATERAL (
SELECT * FROM messages msub
WHERE msub.user_id = m4.user_id
AND msub.created_at <@
tsrange(t.s + interval '4 days',
t.s + interval '5 days')
LIMIT 1
) m5
ON true
WHERE m1.created_at <@
tsrange(t.s,
t.s + interval '1 day')
;
Here I am using the t(s)
CTE just to avoid repeating :start_time
again and again. It is optional if you don't like it. Also naturally in Rails you would use ?
instead of :start_time
to parameterize the query.
For testing it is helpful to replace each COUNT(...)
with array_agg(...)
so you can decide if the right user_id
s are included or not.
I think this should perform well if you have an index on created_at
and user_id
(together). Or if your days will always start at the same instant (say midnight UTC), then you could use a functional index with just the date (not timestamp) and user_id
, and then replace all the range conditions with simply being that day. That will perform even better.
Oh also: your query (and mine) always return just one row, which seems pretty suspicious. I wonder if that is really what you want, or if that's just an accident of simplifying things for your question. If you wanted one row per starting day, then you could put your day
column back, group by it, remove my WHERE
condition, and do all the joins based on the previous m
table instead of t.s
.