To ensure that 1, and only 1, row is returned per project_id
a better method is to use row_number() over()
where the partition by
within the over()
clause is similar to what you would have grouped by and the order by
controls which row within each partition is given the value of 1. In this case the value of 1 is given to a row with the earliest created date, and further columns can also be referenced as tie-breakers (e.g. using id
). Every other row within the partition is given the next integer value so only one row in each partition can be equal to 1. So to limit the final result, use a derived table (subquery) followed by a where clause that restricts the result to the first row per partition i.e. where rn = 1
.
SELECT
*
FROM (SELECT *
, row_number() over(partition by project_id order by created, id) as rn
FROM tasks
) AS derived
WHERE rn = 1
nb: to get the most recent row reverse the direction of ordering on the date column
Not only will this technique ensure only 1 row per partition is returned it also requires fewer passes through the data (than your original approach), so it is efficient as well.
tip: if you did want to get more than 1 row per partition returned then use rank()
or dense_rank()
instead of row_number()
- because the ranking functions will recognize rows of equal rank and hence return the same rank value. i.e. more than 1 row could get a rank value of 1