How to eliminate duplicate rows based on column max value in sql?

Question

I have a view which returns duplicate rows having field performance, which can be 1,2,3 or 4. I need to select only rows which is having max performance. How to do this ? Tried this :

View :

enter image description here

results needed as follows :

enter image description here

So basically for each employee number group, I need to fetch what is max calender value and corresponding performance value

Miising details such as structure of the view and RDBMS that you are using. — Martin Smith, Jan 05 '14 at 13:30
You write that you want the rows with max performance, but your query looks like you want the rows with max "Calender" values. — Stefan Dorner, Jan 05 '14 at 13:50
It would be nice if you add an example - about a dozen of rows in a view and the goal result of a query you try to d esign — Arioch 'The, Jan 05 '14 at 14:58
Still you did not told what the sql server is. And what if there would be more than single row having same Employee and same max Calender? — Arioch 'The, Jan 05 '14 at 18:13

wildplasser · Answer 1 · 2014-01-05T15:41:32.170

3

You can express

the employee record with the highest performance

as:

there is no record (for this employee) with a higher performance

And that gives:

SELECT * FROM employee e
WHERE NOT EXISTS (
   SELECT *
   FROM employee nx
   WHERE nx.employee_nr = e.employee_number
   AND nx.performance > e.performance
   );

edited Jan 05 '14 at 15:41

answered Jan 05 '14 at 15:33

wildplasser

43,142
8
66
109

1

with some RDBMS engines like Firebird/Interbase "NOT EXIST" leads to sub-select for every row. I think this query better reformulated as LEFT JOIN with IS NULL condition. That would be less easy to read, but sometimes considerably faster – Arioch 'The Jan 05 '14 at 18:55
In that case Firebase is an exception. For most DMBSs the performance is the same (even the query plans are the same) or even better than the LEFT JOIN ugliness. The probable reason for this is that EXISTS has existed allmost from the beginneng, while LEFT JOINS were introduced by sql92. (in the old days we had to emulate LEFT JOINS by means of `select ... from a,b WHERE ... UNION ALL SELECT FROM a where not exists (SELECT ... FROM b ..);` – wildplasser Jan 05 '14 at 19:05
Dunno why. Firebase is really different, for example `insert into TABLENAME select * from TABLENAME` fork bomb :-) I'd expect from small cheap databases like MySQL, Anywhere, SQLite to has less optimizations available as well. I remember I tried to bottom-up calculations of MLM-like tree. `UPDATE table SET (parent re-calculation) where not exists (immediate children that were not calculated themselves)` - looped until "0 rows affected" - in old Interbase 5.5 this ignored all the indices and made a perfect NATURAL x NATURAL :-| – Arioch 'The Jan 05 '14 at 19:32

Arioch 'The · Answer 2 · 2014-01-05T18:52:16.560

You need to do two queries: read max calender for any given employee group, then to select the rows with the same those values is calender and group.

  Select vm."Employee Number" as eGroup, max(vm.Calender) as Calender From view1 vm

That part is obvious. The problem is how to inject it as a criterion for

  Select vd."Employee Number", vd.Calender, vd.Performance From view1 vd where ... ?

Since aggregate functions (like max) gives a singular result, you can use JOIN without risk of having Cartesian sets production (when all combinations of rows from two selects are tried, NxM)

  Select vd."Employee Number", vd.Calender, vd.Performance From view1 vd 
  Join (Select "Employee Number" as eGroup, max(vm.Calender) as maxCal From view1) as vm
    On (vd."Employee Number" = vm.eGroup) and (vd.Calender = vm.maxCal)

Note, this still can produce several rows with the same Employee Number and Calender if that was how they were in the table. Unless you have a UNIQUE INDEX upon both those columns.

score 0 · Answer 3 · edited May 23 '17 at 12:32

Thanks all for your inputs. I tried a slightly different approach for this.

Step 1 : Since I did not have any unique identifier for each row, I made one by concatenating Employee_Number and Calender columns. It becomes easier then and we can use following :

SELECT DISTINCT 
      T1.EMPLOYEE_NUMBER, T1.Performance, T1.Calender
FROM  dbo.view1 AS T1 LEFT OUTER JOIN dbo.view1 AS T2
       ON T2.EMPLOYEE_NUMBER = T1.EMPLOYEE_NUMBER
       AND T2.maxval > T1.maxval
WHERE     (T2.maxval IS NULL)

Here Maxval is Concatenated column.

Actual answer is here : SQL Server: Only last entry in GROUP BY

score 0 · Answer 4 · edited Mar 23 '19 at 02:05

0

Your stated goal doesn't match your results, but I would do this (at least in DB2):

select e.Employee_Number, e.Calendar, e.Performance
from Employee e
where e.Calendar in (select max(Calendar) from Employee group by Employee_Number)

edited Mar 23 '19 at 02:05

Nick

138,499
22
57
95

answered Mar 23 '19 at 02:04

Brian Minturn

27
6

Dylan · Answer 5 · 2014-01-05T14:24:53.843

-1

 select v1.performance,max(v1.field)
 from view1 v1
 group by v1.performance

edited Jan 05 '14 at 14:24

answered Jan 05 '14 at 14:14

Dylan

16
1
3

avoid code only answer, consider adding some text as explanation – Stephane Rolland Jan 05 '14 at 14:40
1

Minus 1. This doesn't even come close to answering the question. – Dan Bracuk Jan 05 '14 at 14:42

How to eliminate duplicate rows based on column max value in sql?

5 Answers5

Linked