1

I have a table called RESULTS like this :

RESULTS_IDN    NAME    SUBJECT    YEAR    QUALIFIED
1              MARK    ENGLISH    1989    N
3              MARK    ENGLISH    1991    N
5              MARK    ENGLISH    1993    Y
7              MARK    ENGLISH    1995    N
2              MARK    MATH       1990    N
5              MARK    MATH       1993    N
6              MARK    MATH       1995    Y
4              MARK    SCIENCE    1991    N
9              MARK    SCIENCE    1997    Y

I need to know the Qualification Status of the CANDIDATE for a SUBJECT for the LATEST exam he has written , how do I write a query for this (ORACLE/MSSQL) ?

For example Input

NAME,SUBJECT  OUTPUT NAME IDN SUBJECT YEAR Q
MARK,ENGLISH  OUTPUT MARK 7   ENGLISH 1995 N 
MARK SCIENCE  OUTPUT MARK 9   SCIENCE 1997 Y
MARK MATH     OUTPUT MARK 6   MATH    1995 Y

I know of one way to solve this .

(SELECT NAME SUBJECT YEAR MAX(YEAR) YEAR
FROM RESULTS WHERE NAME = 'MARK' AND SUBJECT ='MATH'
GROUP BY NAME SUBJECT YEAR) LATEST 

Join the above table back on IDN to the same table and I can get the results . But this is double work . Is there anyway I can club the MAX(YEAR) and get the CORRESPONDING YEAR using HAVING CLAUSE or something ? I need 2 operations on the GROUP BY data , one Latest , and the corresponidng Qualified status .

PS : Ofcourse there are records for 100 candidates like this in the DB .

Update : This question is also categorized as greatest-n-per-group problem as per answer 2. Interesting to know it is a classified problem in DB .

Nishant
  • 20,354
  • 18
  • 69
  • 101

2 Answers2

3

In both Oracle and SQL Server you can use the analytic/windowing functions RANK() or ROW_NUMBER() to achieve this:

select *
  from ( select a.*
              , rank() over ( partition by name, subject order by year desc ) rnk
           from ... a
                )
 where rnk = 1

RANK() will return 1 for every row that is the newest per name and subject, ROW_NUMBER() will return a random row.

In Oracle alone you can use KEEP to give you the same result:

select name, subject, max(year) as year
     , max(qualified) keep (dense_rank first order by year desc) as qualified
  from ...
 group by name, subject
Ben
  • 51,770
  • 36
  • 127
  • 149
  • Thanks ! I skipped reading about analytic/windowing function ... which someone asked me to read up . – Nishant Jun 29 '13 at 15:09
  • My sublte question however is , whether we can use aggregrate function and then retrieve a parm based on that from the 'other' rows exlcuded in group_by . – Nishant Jun 29 '13 at 15:24
  • 1
    The Oracle syntax in the answer gives you what you're asking for @Nishant. You can't do this with SQL Server though. – Ben Jun 29 '13 at 15:35
  • Is there any technical term for what I am asking , its basically a condition over another condition sort of thing in Group By . Just curious . Having just checks for something 'over' a Group , but we want to collect something if that matches , it can't be classified as a Predicate , And Or etc - since Having works over a Group . Also Ben if I need Oracle MSSQL compatible code , we need to use my solution ? – Nishant Jun 29 '13 at 15:48
  • 1
    I think it's just still an aggregate function @Nishant, though with an additional ORDER BY. You don't have to use your solution, as you mention it will mean two scans of your table/index. You can use the analytic functions I suggest initially, which are both SQL Server and Oracle compatible (and will involve only one scan). – Ben Jun 29 '13 at 15:56
1

This is all-the-time reinvented problem of :

SELECT t1.*
FROM RESULTS AS t1
LEFT JOIN RESULTS AS t2
  ON t1.NAME = t2.NAME AND t1.SUBJECT = t2.SUBJECT AND t1.YEAR < t2.YEAR
WHERE t2.NAME IS NULL
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • Thanks for pointing out to the problem ,it even has a name ! As Ben said I am also apprehensive about 2 index scans which is not required ideally for this problem . – Nishant Jun 30 '13 at 04:34
  • Well you are both correct that there are other very interesting solutions (like the one of @Ben) which may be faster.. – Tomas Jun 30 '13 at 07:54
  • ... however beware @Nishant, these are not portable! – Tomas Jun 30 '13 at 07:54
  • Windowing functions are available in all major RDBMS bar MySQL, so saying it's "not portable" is more than a little misleading. It'll work on DB2, Oracle, Postgres, SQL Server, Sybase, Informix, Teradata and (I believe) Firebird 3. They are also the standard way to solve this problem, once again in every RDBMS save MySQL as they are provably the fastest way to obtain the data. The only possible exception is Oracle, who's non-standard keep dense_rank extension to the SQL Standard can be quicker and more efficient than a windowing/analytic function. – Ben Jun 30 '13 at 08:06
  • @Ben, but MySQL is one of the most used :-) I mean, all these solutions are out of SQL standard, and thus not portable. Dot. – Tomas Jun 30 '13 at 08:09
  • @Thomas , this is actually the solution our DB guys use , I was looking for a faster one and I think Ben's is faster. However thanks for jotting down the code for others to read ... and also for letting me know 'greatest-n-group' problem classification . One of the reason I asked is to know if this problem has been classified somewhere :-) because it looks like a trivial issue . I think SO should have multiple select for best answer ! – Nishant Jun 30 '13 at 08:11
  • The OP isn't using it @Tomas, they're using SQL Server and Oracle as described in the question. MySQL may be seemingly one of the most used (SO doesn't really count and I can't find any recent market research) but it's not going to have 50% market coverage. Yes, they was introduced in SQL 2003. – Ben Jun 30 '13 at 08:12
  • 1
    @Nishant no problem, Bens answer is the best - I just wanted to make sure you know that this is not portable solution. – Tomas Jun 30 '13 at 08:14