I need to cache the first, last, and second to last time a thing happened per user. The history table I'm querying has hundreds of millions of rows (we're caching so we can truncate it), and the table I'm updating has dozens of millions.
Currently I'm doing it in batches of 1000 to avoid locking the tables. The query is like so:
with ranked as (
select
user_id,
rank() over (partition by user_id order by created_at desc) as ranked_desc,
rank() over (partition by user_id order by created_at asc) as ranked_asc,
created_at
from history
where type = 'SomeType' and
user_id between $1 and $2
)
update
users u
set
latest_at = (
select created_at
from ranked
where ranked.ranked_desc = 1 and ranked.user_id = u.id
),
previous_at = (
select created_at
from ranked
where ranked.ranked_desc = 2 and ranked.user_id = u.id
),
first_at = (
select created_at
from ranked
where ranked.ranked_asc = 1 and ranked.user_id = u.id
)
from ranked
where u.id = ranked.user_id
Relevant indexes on history are these. They are all btree indexes.
- (created_at)
- (user_id, created_at)
- (user_id, type)
- (type, created_at)
Can this be optimized? I feel this can be done without the subqueries.