0

Let's say I have an orders table with customer_id, order_total, and order_date columns. I'd like to build a report that shows all customers who haven't placed an order in the last 30 days, with a column for the total amount their last order was.

This gets all of the customers who should be on the report:

select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval

Is there a better way to do this that doesn't require a subquery but instead uses a window function or other more efficient method in order to access the total amount from the most recent order? The techniques from How to select id with max date group by category in PostgreSQL? are related, but the extra having restriction seems to stop me from using something like DISTINCT ON.

S-Man
  • 22,521
  • 7
  • 40
  • 63
spike
  • 9,794
  • 9
  • 54
  • 85

2 Answers2

1

demo:db<>fiddle


Solution with row_number window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)

SELECT 
    customer, order_date, order_total
FROM (
    SELECT
        *, 
        first_value(order_date) OVER w as last_order, 
        first_value(order_total) OVER w as last_total,
        row_number() OVER w as row_count
    FROM orders
    WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30

Solution with DISTINCT ON (https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT):

SELECT
    customer, order_date, order_total
FROM (
    SELECT DISTINCT ON (customer)
        *, 
        first_value(order_date) OVER w as last_order, 
        first_value(order_total) OVER w as last_total
    FROM orders
    WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
    ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30

Explanation:

In both solutions I am working with the first_value window function. The window function's frame is defined by customers. The rows within the customers' groups are ordered descending by date which gives the latest row first (last_value is not working as expected every time). So it is possible to get the last order_date and the last order_total of this order.

The difference between both solutions is the filtering. I showed both versions because sometimes one of them is significantly faster

The window function style is creating a row count within the frames. Every first row can be filtered later. This is done by adding a row_number window function. The benefit of this solution comes out when you are trying to filter the first two or three data sets. You simply have to change the filter from WHERE row_count = 1 to WHERE row_count = 2

But if you want only one single row per group you just need to ensure that the expected row per group is ordered to be the first row in the group. Then the DISTINCT ON function can delete all following rows. DISTINCT ON (customer) gives the first (ordered) row per customer group.

S-Man
  • 22,521
  • 7
  • 40
  • 63
0

Try to join table on itself

select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval

Subqueries in select is a bad idea, because DB will execute a query for each row

If you use postgres you can also try to use CTE

https://www.postgresql.org/docs/9.6/static/queries-with.html

WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer 
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval
Rajeev Ranjan
  • 497
  • 4
  • 16
rgen3
  • 657
  • 5
  • 10
  • I want to do this without the extra join, possibly with a window function. I don't think a CTE is better than my original example. – spike Sep 21 '18 at 22:05
  • You can use in CTE something like `select distinct on (id) id, order_total from orders...` than pg should stop on first occurrence, that may give you some performance improvement. Using window function adds an additional cycle for the query and may not give you any performance improvement. Nevertheless, if you need more information, view on explain results, it should help to choose the right way solution – rgen3 Sep 21 '18 at 22:13