0

I am curious about the performance of using COUNT(column_name) twice in a single query. Here is the query in question:

SELECT
    employee_name,
    COUNT(employee_name)
FROM
    employee
GROUP BY
    employee_name
HAVING
    COUNT(employee_name) > 1;

Will

COUNT(employee_name)

be executed twice? Furthermore, how can I check for myself the performance of what is going on under the covers when I have questions like this in the future?

Thanks!

Liem Ta
  • 39
  • 4
  • Use `COUNT(*)` unless you want to avoid counting rows where `employee_name IS NULL`. – Rick James Aug 28 '17 at 20:56
  • Possible duplicate of [Does MySQL eliminate common subexpressions between SELECT and HAVING/GROUP BY clause](https://stackoverflow.com/questions/23824505/does-mysql-eliminate-common-subexpressions-between-select-and-having-group-by-cl) – philipxy Jun 16 '19 at 03:08

2 Answers2

2

You can use optimizer trace to get more knowledge about how the optimizer executes the query and why. For this particular case, the trace does not explicitly tell how many times the count is computed, but we can get information about the temporary table that is used to perform the aggregation:


mysql> SET optimizer_trace='enabled=on';                                               
Query OK, 0 rows affected (0,00 sec)

mysql> SELECT c2, COUNT(c2) FROM temp GROUP BY c2 HAVING COUNT(c2) > 1;
+------+-----------+
| c2   | COUNT(c2) |
+------+-----------+
|    1 |         2 |
|    2 |         2 |
+------+-----------+
2 rows in set (0,00 sec)

mysql> SELECT trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'
    -> FROM information_schema.optimizer_trace;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"tmp_table_info": {"table": "intermediate_tmp_table", "location": "memory (heap)", "key_length": 5, "row_length": 23, "unique_constraint": false, "row_limit_estimate": 729444}}] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,01 sec)

mysql> SELECT c2, COUNT(c2) AS c FROM temp GROUP BY c2 HAVING c > 1;
+------+---+
| c2   | c |
+------+---+
|    1 | 2 |
|    2 | 2 |
+------+---+
2 rows in set (0,00 sec)

mysql> SELECT trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'           -> FROM information_schema.optimizer_trace;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"tmp_table_info": {"table": "intermediate_tmp_table", "location": "memory (heap)", "key_length": 5, "row_length": 14, "unique_constraint": false, "row_limit_estimate": 1198372}}] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)

For the above we see that row size for the temporary table is smaller (14 vs 23 bytes) when an alias is used instead of repeating the COUNT expression. This indicates that for your query the counting is done twice during aggregation.

Øystein Grøvlen
  • 1,266
  • 6
  • 8
-1

Pick any handy table and do this:

mysql> SELECT RAND() AS r FROM canada HAVING r < 0.1 limit 11;
+-----------------------+
| r                     |
+-----------------------+
|    0.6982369559800596 |
|   0.33121224616767114 |
|    0.3811396559524719 |
|    0.4718028721136999 |

See also:

Using `rand()` with `having`

Is there Performance related difference in using aggregate function in ORDER BY clause and alias of aggregate function?

And I think there are other discussions involving non-RAND cases.

The original question uses COUNT(employee_name), which delivers the same value in both situations. So, you can't really tell if it was 'evaluated' twice. By using RAND(), it becomes clear that it is reevaluated.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • 1
    This ia not a good example IMO because in your query `HAVING` is really behaving like `WHERE`. It is one peculiarity of MySQL that `HAVING` has been overloaded in such a way. – Tim Biegeleisen Aug 28 '17 at 23:27
  • @TimBiegeleisen - My point is that, in spite of the alias, it is obviously re-evaluating. – Rick James Aug 28 '17 at 23:38
  • Good point +1. I will update my answer and cite what you said. But in any case, I think the OP was mainly asking in the context of a `GROUP BY` query. – Tim Biegeleisen Aug 29 '17 at 01:26
  • `RAND()` is different from `COUNT()` since it is not an aggregation function. Aggregation functions referred in `HAVING` will have to be computed during aggregation, while other functions will be computed when the `HAVING` clause is evaluated. MySQL also has special code to handle `RAND()`, so I would not trust it as an example of how other functions behave. As my answer shows, if an alias is used for `COUNT`, there will be fewer columns in the temporary table that is used to do the aggregation. – Øystein Grøvlen Aug 29 '17 at 08:18