0

I am facing a conceptual issue. Would be really grateful for your help.

I have two tables one is accounts(that has id, sales_rep_id, name as columns) and sales_reps( that has id, name as columns). I want to do the following query:

How many of the sales reps have more than 5 accounts that they manage?

I got the answer to it but I have a conceptual issue here, what does COUNT(*) point here ( number of accounts is okay but I am unable to get how could we justify that).

Thank you!

SELECT s.id, s.name, COUNT(*) num_accounts
FROM accounts a
JOIN sales_reps s
ON s.id = a.sales_rep_id
GROUP BY s.id, s.name
HAVING COUNT(*) > 5
ORDER BY num_accounts;

2 Answers2

1

This might be easier to understand if think of it without the join:

SELECT a.sales_rep_id, COUNT(*) as num_accounts
FROM accounts a
GROUP BY a.sales_rep_id
HAVING COUNT(*) > 5
ORDER BY num_accounts;

This is counting the number of rows in accounts with the same sales rep. The join is just bringing in the name of the sales_rep; it is not changing the number of rows (assuming that the join keys match).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • OK . Thank you Got that point. But suppose if I were using FROM sales_reps then what modification would have been required in COUNT(*). I guess then we should have used COUNT(a.id) instead. Right?? Or something else – Ruhan Saini Aug 25 '21 at 00:22
  • @RuhanSaini . . No, your query is fine. I thought this might help explain what is happening. – Gordon Linoff Aug 25 '21 at 00:24
  • Yes I understood your point. Just as an extension I had been asking. ```SELECT s.id, s.name, COUNT(*) num_accounts FROM sales_reps s JOIN accounts a ON s.id = a.sales_rep_id GROUP BY s.id, s.name HAVING COUNT(*) > 5 ORDER BY num_accounts;``` Now will it be wise to use COUNT(*) or should I use COUNT(a.id) – Ruhan Saini Aug 25 '21 at 00:29
  • @RuhanSaini . . . It doesn't make a difference. When counting rows, `COUNT(*)` is the normal approach. Counting a column requires counting non-`NULL` values. – Gordon Linoff Aug 25 '21 at 00:31
  • Ok Thanks. Just by texting I am unable to put my doubt forward. But just one last query that I think should clear all my doubts. When does this COUNT(*) execute before JOIN or after JOIN – Ruhan Saini Aug 25 '21 at 00:37
  • @RuhanSaini . . . The `FROM` clause generates the rows. The `GROUP BY` aggregates them. The `COUNT(*)` is part of the aggregation. – Gordon Linoff Aug 25 '21 at 00:39
  • The `FROM` and `JOIN` expressions are evaluated _FIRST_ to construct a single flat recordset. Then `WHERE` is evaluated to filter the rows, then `GROUP BY` chunks that single resultset into separate _groups_. The `Count` is evaluated across each group individually. – Chris Schaller Aug 25 '21 at 00:45
0

The * indicates that all rows should be evaluated to determine the count, regardless if the nullability or uniqueness.

There is a great summary here: MySQL COUNT And COUNT DISTINCT With Examples

COUNT(*) function returns the no. of rows retrieved by the SELECT statement including rows containing NULL and Duplicate values

SELECT COUNT(*) FROM {tableName}

COUNT(Expression) would count the value where expression is not null. Expression can be something simple like a column name or a complex expression such as IF Function.

SELECT COUNT(Expression) from {tableName}

COUNT(DISTINCT Expression) - DISTINCT keyword would result in counting only unique non null values against the expression.
For example - COUNT(DISTINCT customerName) - would only count rows having distinct values for customer name

SELECT COUNT(DISTINCT expression) from {tableName}

From a performance point of view, in MySQL and most other RDBMS the COUNT(*) specifically avoids any value comparisons, where as the other forms of count will inspect the column values to a degree.

If there is a unique value in the resultset and that column is indexed, then you may see some performance gains by counting just that column, instead of all rows.

Other reading:

Chris Schaller
  • 13,704
  • 3
  • 43
  • 81