0

There is a tasks table. id | name | project_id | created | ...

Tasks can be in different projects. I need to return one task from each project with a minimum creation date. Here is my solution

SELECT *
FROM tasks a
JOIN (
    SELECT project_id, min(created) as created
    FROM tasks
    GROUP BY project_id
    ) b
ON a.project_id=b.project_id AND a.created = b.created;

but if there are points in the project with the same creation dates, then I return two records for one project

Nakem1
  • 313
  • 1
  • 10
  • You can add `DISTINCT` will remove duplication ( after select) – Marya Mar 01 '22 at 23:27
  • there are many fields in the table. The entries will be different, but the created one is the same – Nakem1 Mar 01 '22 at 23:29
  • Usually you add `row_number() over (partition by project_id order by created, X) as rn` in your subquery and the filter `where rn = 1` Sort on another column via `X` to break ties. – shawnt00 Mar 01 '22 at 23:29
  • Does this answer your question? [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) – Stu Mar 01 '22 at 23:52

1 Answers1

0

To ensure that 1, and only 1, row is returned per project_id a better method is to use row_number() over() where the partition by within the over() clause is similar to what you would have grouped by and the order by controls which row within each partition is given the value of 1. In this case the value of 1 is given to a row with the earliest created date, and further columns can also be referenced as tie-breakers (e.g. using id). Every other row within the partition is given the next integer value so only one row in each partition can be equal to 1. So to limit the final result, use a derived table (subquery) followed by a where clause that restricts the result to the first row per partition i.e. where rn = 1.

SELECT
      *
FROM (SELECT *
         , row_number() over(partition by project_id order by created, id) as rn
      FROM tasks
     ) AS derived
WHERE rn = 1

nb: to get the most recent row reverse the direction of ordering on the date column

Not only will this technique ensure only 1 row per partition is returned it also requires fewer passes through the data (than your original approach), so it is efficient as well.

tip: if you did want to get more than 1 row per partition returned then use rank() or dense_rank() instead of row_number() - because the ranking functions will recognize rows of equal rank and hence return the same rank value. i.e. more than 1 row could get a rank value of 1

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51