-1

These are the tables I have:

sales 
(
    "customer_id" VARCHAR(1),
    "order_date" DATE,
    "product_id" INTEGER
);

menu 
(
    "product_id" INTEGER,
    "product_name" VARCHAR(5),
    "price" INTEGER
);

I'm trying to find what was the first item from the menu purchased by each customer. So after a couple of hours I got to this solution:

WITH cte_product AS
(
    SELECT 
        sales.customer_id, menu.product_name, 
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY customer_id) row_a
    FROM 
        sales
    INNER JOIN 
        menu ON sales.product_id = menu.product_id
)
SELECT *
FROM cte_product
WHERE row_a = 1

And this in fact gives me back only the first order each customer made by date.

Although I'm happy to have an answer, I'm wondering if there is another way to do this. A more simple way perhaps?

I previously tried DISTINCT, ORDER BY, WHERE and GROUP BY without success, but I'm new in this, so I'm here to ask you guys if this is a good enough way or if there is a better way to do it.

lemon
  • 14,875
  • 6
  • 18
  • 38
Ale
  • 107
  • 6

2 Answers2

4

Employing window functions for this kind of task is the most performant way of doing it. Although there's a tiny issue with your window function:

  • PARTITION BY customer_id will allow you to make a ranking for each of your distinct customer
  • ORDER BY customer_id will order on your customer_id values, within your partition. But inside your partitions, customer_id will always have the same value: your output is the right one by chance at the moment. This is the reason why you should use "order_date" instead of "customer_id".
WITH cte_product AS (
    SELECT sales.customer_id, 
           menu.product_name, 
           ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) row_a
    FROM       sales
    INNER JOIN menu 
            ON sales.product_id = menu.product_id
)
SELECT *
FROM cte_product
WHERE row_a = 1

Another almost equally performant way of doing it, without subqueries this time, is using TOP(n). This will do is getting your top n values given the imposed ordering. Since we want all records that have row-number = 1, we can use TOP(1) WITH TIES, or translated, all records whose row-number is 1 (tied).

SELECT TOP(1) WITH TIES
       sales.customer_id, 
       menu.product_name
FROM       sales
INNER JOIN menu 
        ON sales.product_id = menu.product_id
ORDER BY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
lemon
  • 14,875
  • 6
  • 18
  • 38
  • 3
    To be honest, WITH TIES is a nudge less performant because it requires a sort. That said, I tend to use this option with smaller tables ... it seems cleaner and easier. Either way +1 – John Cappelletti May 20 '23 at 19:30
1

This is an other way to get first order per customer using group by and the aggregate function min() :

First you need to find first order date :

select customer_id, min(order_date) as min_order_date
from sales s
group by customer_id

Then you need to join sales table with this dataset to get product_id :

select s.*
from sales s
inner join (
  select customer_id, min(order_date) as min_order_date
  from sales s
  group by customer_id
) as t on s.customer_id = s.customer_id and s.order_date = t.min_order_date;

Then :

select s.*, m.product_name
from sales s
inner join (
  select customer_id, min(order_date) as min_order_date
  from sales s
  group by customer_id
) as t on s.customer_id = s.customer_id and s.order_date = t.min_order_date
inner join menu m on m.product_id = s.product_id;

Demo here

SelVazi
  • 10,028
  • 2
  • 13
  • 29