-3

I want to filter out the record with the highest salary for each department. If there are multiple records that meet this condition in a department, keep all of them.Then I write SQL following:

SELECT id, name, salary, departmentId
  FROM Employee
  GROUP BY departmentId
  HAVING salary = MAX(salary)

But it's not working correctly.

the Employee table:

+----+-------+--------+--------------+
| id | name  | salary | departmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

id is the primary key column for this table.

The result of the execution is:

| id | name  | salary | departmentId |
| -- | ----- | ------ | ------------ |
| 3  | Henry | 80000  | 2            |

But the result that I expected is like this:

| id | name  | salary | departmentId |
| -- | ----- | ------ | ------------ |
| 2  | Jim   | 90000  | 1            |
| -- | ----- | ------ | ------------ |
| 3  | Henry | 80000  | 2            |
| -- | ----- | ------ | ------------ |
| 5  | Max   | 90000  | 1            |

I want to know why I can't get the result that I expect? what's wrong with my SQL? I know how to write the correct SQL statement now. I just want to know what exactly is wrong with my SQL statement. Analysis of cause is more important for me.

Flamer
  • 5
  • 4
  • 1
    `GROUP BY` returns *eliminates* all the original group rows and returns only one row per group. This query is invalid SQL to begin with and only works in MySQL due to historical quirks. Which of the possibly thousands of `salary` values per group should be returned? – Panagiotis Kanavos Jul 14 '23 at 14:27
  • 1
    Your query is syntactically incorrect, you should enable `only_full_group_by` – Stu Jul 14 '23 at 14:27
  • 1
    To compare a row's salary with the group's maximum you 'll have to use a subquery, eg `salary = (SELECT MAX(salary) from Employee e where E.DepartmentID=Employee.DepartmentID)` or use a ranking function eg `RANK() OVER(PARTITION BY DepartmentID, ORDER BY salary desc)` to calculate the employee rank by salary in the department and allow only the 1st. You'll have to test which is faster – Panagiotis Kanavos Jul 14 '23 at 14:31
  • @PanagiotisKanavos I know how to write the correct SQL. I'm just curious about where my SQL statement went wrong. Can't the HAVING clause be used to filter out all records within a group that have the maximum salary? – Flamer Jul 14 '23 at 14:36
  • The question's SQL is doubly wrong - it's invalid syntax *and* makes no sense. I explained why. `GROUP BY` *eliminates* the original rows so you can't its results to find original rows. HAVING filters group rows, it doesn't ungroup the group to get to the originals. – Panagiotis Kanavos Jul 14 '23 at 15:02
  • @PanagiotisKanavos `GROUP BY returns eliminates all the original group rows and returns only one row per group. ` I don't think it's right. for example, `select departmentId, MAX(salary) FROM Employee GROUP BY departmentId`, if `GROUP BY` clause returns only one row per group, the `MAX()` function will not work correctly. Is there a problem with my understanding? – Flamer Jul 14 '23 at 15:07
  • @Flamer you can try it with any database or dbFiddle. `GROUP BY` creates one row per group. Aggregate functions on non-group columns calculate the value that will be returned in that row. `MAX(salary)` works on the group's rows to return a single value, the maximum. After GROUP BY there's no way to refer to the group's original rows. `WHERE` can only the rows *before* grouping, which is why HAVING was added to filter the group rows – Panagiotis Kanavos Jul 14 '23 at 15:16
  • @PanagiotisKanavos But the execution order is `GROUP BY` --> `SELECT`。 The `MAX()` function in the `SELECT` statement, Shouldn't the `MAX()` function be executed after the `GROUP BY` statement in the query? – Flamer Jul 14 '23 at 15:27
  • 1
    SQL is a declarative language. It describes *what* you want to do, not how it should be done or what order. In fact, it was first created as a textual query language for *business* users, not programmers. The database engine will parse that query, apply optimizations based on indexes, data sizes etc and come up with an execution plan that may be very different than what you assumed from the query. – Panagiotis Kanavos Jul 14 '23 at 15:44
  • Check the execution plan to see what's actually going on. The database may use the columns in SELECT to choose to read only from an index ( if it contains all columns) or while reading rows from a table. That happens before any attempt at aggregation. If GROUP BY is done on the index columns you may see that the rows are read in a stream and aggregates calculated on the fly. If not, you may see that the data gets stored temporarily before the server calculates aggregates on the temporary data. – Panagiotis Kanavos Jul 14 '23 at 15:48
  • @PanagiotisKanavos Thank you for patiently answering my questions. Indeed, there is an issue with my understanding. – Flamer Jul 14 '23 at 15:58

2 Answers2

2

You can have this by using a subquery to find the max salary with respect to the department.

SELECT 
  * 
FROM
  `Employee` e 
WHERE e.`salary` = 
  (SELECT 
    MAX(e1.`salary`) 
  FROM
    `Employee` e1 
  WHERE e1.`departmentId` = e.`departmentId`)

OR

SELECT e.id, e.name, e.salary, e.departmentId
FROM Employee e
INNER JOIN (
  SELECT departmentId, MAX(salary) AS max_salary
  FROM Employee
  GROUP BY departmentId
) subquery
ON e.departmentId = subquery.departmentId AND e.salary = subquery.max_salary;

This answer also works for MySQL 5.6 If you are using MySQL 8.X, you have your answer already in the comment section

Asgar
  • 1,920
  • 2
  • 8
  • 17
  • 1
    The edit is wrong. `HAVING` is only used to filter the GROUP BY results. HAVING isn't the same as WHERE – Panagiotis Kanavos Jul 14 '23 at 15:05
  • Thanks for pointing that out @PanagiotisKanavos, I happened to have copied the wrong query from my sqlyog that I just tried. Answer edited :) – Asgar Jul 14 '23 at 15:07
  • Ohh okay, I get that now. Thanks for the lesson. I am removing the edited part completely. – Asgar Jul 14 '23 at 15:15
  • This question has already been answered dozens of times in SO, both for v8 and pre v8... – Shadow Jul 14 '23 at 15:42
  • @Shadow He/She misunderstand my question, I just want to know where the root cause of my SQL error is. – Flamer Jul 14 '23 at 16:03
0
select 
se.id, 
se.name, 
se.salary, 
se.department_id 
from stack_employees as se
join (
    select 
    department_id, 
    max(salary) as max_salary 
    from stack_employees
    group by department_id) di
on se.department_id = di.department_id
and se.salary = di.max_salary;

this will give you the desired output.

Asgar
  • 1,920
  • 2
  • 8
  • 17
curious
  • 17
  • 4