38

I have a table named Course in a Postgres database:

sample data for table Course

How can I select rows which have course name with latest date? I mean if I have two same course names for one ID, I should only show the latest one as the below result.

Simply, I want only to show the latest row per ("ID", "Course Name").

desired result

And what if I have two date columns in table Course, which are StartDate & EndDate and I want to show the same based on EndDate only?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Aan
  • 12,247
  • 36
  • 89
  • 150
  • 1
    possible duplicate of [How to select id with max date group by category in PostgreSQL?](http://stackoverflow.com/questions/16914098/how-to-select-id-with-max-date-group-by-category-in-postgresql) – elsadek Sep 29 '14 at 06:05
  • Thank you for asking this. GREAT example. – Evil Elf Nov 20 '15 at 18:31

5 Answers5

52

In PostgreSQL, to get unique rows for a defined set of columns, the preferable technique is generally DISTINCT ON:

SELECT DISTINCT ON ("ID") *
FROM   "Course"
ORDER  BY "ID", "Course Date" DESC NULLS LAST, "Course Name";

Assuming you actually use those unfortunate upper case identifiers with spaces.

You get exactly one row per ID this way - the one with the latest known "Course Date" and the first "Course Name" (according to sort order) in case of ties on the date.

You can drop NULLS LAST if your column is defined NOT NULL.

To get unique rows per ("ID", "Course Name"):

SELECT DISTINCT ON ("ID", "Course Name") *
FROM   "Course"
ORDER  BY "ID", "Course Name", "Course Date" DESC NULLS LAST;

There are faster query techniques for many rows per group. Further reading:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • What if I have two date columns in `Course` table, which are `StartDate` & `EndDate` and I want to show the same based on `EndDate` only.? – Aan Sep 26 '13 at 12:56
  • @Aan: `... ORDER BY "ID", "Course Name", "EndDate" DESC NULLS LAST` to get the row with the latest `"EndDate"`. Basically, order by as many columns (or expressions) as you want. Make it unambiguous or get an arbitrary pick from qualifying peers. I also added the `DISTINCT ON` columns to `ORDER BY` as required, those were missing in my first draft. – Erwin Brandstetter Sep 26 '13 at 22:50
  • upvoted! works well if you have a desc index like (id, date desc) but if you have (id, date) it is slow, i am assuming this is because the index operates in 2 levels having id at the first level and then doing a date lookup inside, what if the index was (date, id) would that mean date desc wont be required? – PirateApp Jun 12 '18 at 04:58
  • 1
    @PirateApp: An index on `(date, id)` would be hardly useful. You need `id` first. Follow the link in the answer for more. Also: https://dba.stackexchange.com/a/27493/3684 and https://dba.stackexchange.com/a/39599/3684 – Erwin Brandstetter Jun 12 '18 at 10:31
  • Just be aware that this is not a good answer if you've got 16GB of records in the database. – Owl Oct 28 '21 at 13:28
  • @Owl: The size of the table is not the deciding factor. *Data distribution* is. For few duplicates, `DISTINCT ON` is typically fastest. See: https://stackoverflow.com/a/34715134/939860 Depending on data distribution and setup, there may be much faster techniques: https://stackoverflow.com/a/25536748/939860 – Erwin Brandstetter Oct 28 '21 at 14:28
9
SELECT "ID", "Course Name", MAX("Course Date") FROM "Course" GROUP BY "ID", "Course Name"
Maxim Krizhanovsky
  • 26,265
  • 5
  • 59
  • 89
7
SELECT *
FROM (SELECT ID, CourseName, CourseDate, 
      MAX(CourseDate) OVER (PARTITION BY COURSENAME) as MaxCourseDate
FROM Course) x
WHERE CourseDate = MaxCourseDate

Here the MAX() OVER(PARTITION BY) allows you to find the highest CourseDate for each Course (the partition) in a derived table. Then you can just select for the rows where the CourseDate is equal to the maximum Coursedate found for that Course.

This approach has the benefit of not using a GROUP BY clause, which would restrict which columns you could return since any non-aggregrate column in the SELECT clause would also have to be in the GROUP BY clause.

GVIrish
  • 361
  • 1
  • 6
0

Try this:

SELECT DISTINCT ON (c."Id", c."Course Name") 
    c."Id", c."Course Name", c."Course Date" 
FROM (SELECT * FROM "Course" ORDER BY "Course Date" DESC) c;
Minoru
  • 1,680
  • 3
  • 20
  • 44
-1
SELECT * 
FROM  course
GROUP BY id,course name
order by course_date desc
Dev
  • 3,410
  • 4
  • 17
  • 16