1

I have a GROUP BY / MAX(.) situation in SQL server. Please consider the following table, Tab1, with yearly repeating course entries:

studentName Course startDate finishDate
N1 C1 2020-01-01 NULL
N1 C1 2019-01-01 2019-02-01
N1 C1 2018-01-01 2018-02-01
N2 C1 2020-01-01 2020-02-01
N2 C1 2019-01-01 NULL
N2 C1 2018-01-01 2018-02-01
N2 C2 2020-01-01 NULL
N2 C2 2019-01-01 2019-02-01
N2 C2 2018-01-01 2018-02-01

A NULL means the student did not finish the course. I want to access the latest attempt for each student in each course. The output of

SELECT studentName, Course, MAX(startDate), MAX(finishDate)
FROM Tab1
GROUP BY studentName, Course

is:

studentName Course startDate finishDate
N1 C1 2020-01-01 2019-02-01
N2 C1 2020-01-01 2020-02-01
N2 C2 2020-01-01 2019-02-01

This is not correct output, as there should be NULL for N1 <-> C1 and N2 <-> C2 combinations. How can I take MAX(finishDate) while retaining NULL?

Thank you.

Dale K
  • 25,246
  • 15
  • 42
  • 71
ImranAli
  • 166
  • 1
  • 8
  • It's not very clear from your sample data what the output you want.. can you show, form your sample data, what the output should be? for the combination of N2 C1 does it count as finished as there is a higher start date and end date or should the NULL be reported? – Harry Jun 22 '21 at 00:52
  • 1
    @DaleK Done! Thank you for reminder. – ImranAli Jun 22 '21 at 02:36
  • Does this answer your question? [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group) – Charlieface Jun 22 '21 at 10:32
  • @Charlieface I think my question is different. I am interested in preserving `NULL` when taking `MAX(.)` within a group. – ImranAli Jun 22 '21 at 22:07
  • It's not different at all. That is what "top 1 per group" means, get the top row per group. The best answer there uses `row_number()` just like the answer here – Charlieface Jun 22 '21 at 22:09

2 Answers2

2

Assuming that you want the record with the latest start date for each student and course combo - one way to get this is to use the row_number function.

something like this:

;with student_select as (
SELECT 
    studentName
    , Course
    , startDate
    ,finishDate
    ,row_number() over (partition by studentName,Course order by startdate desc) as row_num

FROM Tab1 
)

Select 

    studentName
    , Course
    , startDate
    ,finishDate 

from student_select
where row_num = 1

see SQL FIDDLE example here : - http://www.sqlfiddle.com/#!18/9ad7f/3

Harry
  • 2,636
  • 1
  • 17
  • 29
-1

Get the last start date by max(startdate) grouping by student name and course. Then self join the table again to get the corresponding finish date.

select a.studentName
     , a.course
     , a.last_start_date as start_date
     , b.finishdate
  from (
select studentName
     , course
     , max(startdate) last_start_date
  from Tab1
 group by studentName, course) a, Tab1 b
 where a.studentName = b.studentName
   and a.course = b.course
   and a.last_start_date = b.startdate;
Kelvin Ho
  • 376
  • 2
  • 3
  • 14