0

I am learning postgresql and Inner join I have following table. Employee

Id Name    DepartmentId
1  John S.    1
2  Smith P.   1
3  Anil K.    2

Department

Department
Id Name
1  HR
2  Admin

I want to query to return the Department Name and numbers of employee in each department.

SELECT Department.name , COUNT(Employee.id) FROM Department INNER JOIN Employee ON Department.Id = Employee.DepartmentId Group BY Employee.department_id;

I dont know what I did wrong as I am new to database Query.

lord stock
  • 1,191
  • 5
  • 32

1 Answers1

3

When involving all rows or major parts of the "many" table, it's typically faster to aggregate first and join later. Certainly the case here, since we are after counts for "each department", and there is no WHERE clause at all.

SELECT d.name, COALESCE(e.ct, 0) AS nr_employees
FROM   department d
LEFT   JOIN (
   SELECT department_id AS id, count(*) AS ct
   FROM   employee
   GROUP  BY department_id
   ) e USING (id);

Also made it a LEFT [OUTER] JOIN, to keep departments without any employees in the result. And COALESCE to report 0 employees instead of NULL in that case.

Related, with more explanation:

Your original query would work too, after fixing the GROUP BY clause:

SELECT department.name, COUNT(employee.id)
FROM   department
INNER  JOIN employee ON department.id = employee.department_id
Group  BY department.id;  --!

That's assuming department.id is the PRIMARY KEY of the table, in which case it covers all columns of that table, including department.name. And you may want LEFT JOIN like above.

Aside: Consider legal, lower-case names exclusively in Postgres. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • . . Why is it faster to aggregate first? If there is an index on `department`, I can see that it is true. But if the join is an outer join or if the inner join filters out a significant number of rows, then I would expect the join first to be faster. – Gordon Linoff May 19 '21 at 12:06
  • @GordonLinoff: Why is it faster? Short answer: try it. Long answer is short, too: because there are vastly fewer join operations after condensing the many rows down to a fraction with counts. True, for a selective predicate, it's typically cheaper to join first (or use various alternative query techniques), but this query retrieves counts for *"each department"*. No `WHERE` clause. The first variant will be substantially faster. – Erwin Brandstetter May 19 '21 at 18:53
  • But before the `group by`, the join operations can use an index on `employee(departmentid)`. Your statement also sounds more like an absolute (i.e. this is almost always faster) and not specific to this particular query. That's why I'm asking. – Gordon Linoff May 19 '21 at 19:04
  • Oh, not absolute. The statement is in response to this particular question, I'll make that clearer. It applies when all rows or a major part of the "many" table is involved. (Some other details matter, too ...) – Erwin Brandstetter May 19 '21 at 19:11
  • . . Phew. I thought I might be missing something. Cheers. – Gordon Linoff May 19 '21 at 19:12