-1

Here is my RDB structure.

enter image description here

I try to count the number of departments and employees related to a single location.

select street_address, count(distinct(d.department_id)), count(emp.employee_id) 
from locations loc 
inner join departments d 
on d.location_id = loc.location_id
inner join employees emp 
on emp.department_id =d.department_id
group by street_address

Query execution result:

enter image description here

But without using distinct for counting d.department_id it produces wrong result. enter image description here

Could somebody explain what happens during query execution and why distinct fixes this issue?

  • 1
    The best way to find out for yourself, is to remove the group by and select the actual values. I'd start with a single address. – HoneyBadger May 27 '20 at 07:12
  • Please in code questions give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. Give the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL (including constraints & indexes) & input as code formatted as a table. [ask] Isolate the first subexpression that doesn't output what you expect & say what you expect & why. – philipxy May 27 '20 at 07:18
  • Please [use text, not images/links, for text--including tables & ERDs](https://meta.stackoverflow.com/q/285551/3404097). Use images only for what cannot be expressed as text or to augment text. Include a legend/key & explanation with an image. Please before considering posting read your textbook and/or manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. – philipxy May 27 '20 at 07:21
  • 1
    Duplicate of [sum data from multiple tables](https://stackoverflow.com/q/2591390/3404097). This involves a common error where people want some joins, each possibly involving a different key, of some subqueries, each possibly involving join and/or aggregation, but they erroneously try to do all the joining then all the aggregating or to aggregate over previous aggregations. Write separate aggregations over appropriate rows and/or aggregate a case statement picking rows; join on common unique column sets. Sometimes DISTINCT aggregation picks the right values after a non-key (duplicating) join. – philipxy May 27 '20 at 07:26

1 Answers1

2

The reason you are getting wrong count with count(d.department_id) because there is multiple employees who related to same department_id and that is why you are getting same number of department and employees.

when you use count(distinct d.department_id), then distinct will only count each department_id once instead of counting every time it finds employee associated with department_id.

zealous
  • 7,336
  • 4
  • 16
  • 36