@Jim provided a valid solution. But there are (not so) subtle performance details. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
is the default window frame, spelling it out is just noise. The manual:
The default framing option is RANGE UNBOUNDED PRECEDING
, which is
the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
Not worth another answer, yet. But as row_number()
operates on rows by definition, it's more efficient to use ROWS
mode:
SELECT customer_id, timestamp
FROM (
SELECT row_number() OVER (PARTITION BY customer_id ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rn
, customer_id, timestamp
FROM customer_orders
) sub
WHERE rn <= 3
ORDER BY 1, 2;
Either query is much faster with an index(-only) scan on this index:
CREATE UNIQUE INDEX ON customer_orders (customer_id, timestamp);
I ran extensive tests on Postgres 13 and 14 with ROWS
vs. RANGE
mode, and ROWS
is consistently ~ 20% faster with an index-only scan, and around 5 - 10 % without index. (Higher fixed cost lowers the percentage.) Quite a revelation for one of the most commonly used window functions! :)
I reported the issue, and the upcoming Postgres 16 got a fix! See:
That said, if your table is big and there are many rows per customer, a different query style is much faster, yet. I am talking orders of magnitude. We need the same index as above.
Ideally, you have a table customers
with exactly one row per relevant customer_id
. If you don't have it, create it. Then:
SELECT c.customer_id, o.timestamp
FROM customers c
CROSS JOIN LATERAL (
SELECT timestamp
FROM customer_orders o
WHERE o.customer_id = c.customer_id
ORDER BY o.timestamp
LIMIT 3
) o
ORDER BY 1, 2;
db<>fiddle here
Related: