1

Q1: why count(*) is so much slower than count(distinct col)?

Q2: should id always use count(distinct col)?

select count(id) from source;
+-----------+
| count(id) |
+-----------+
|     22713 |
+-----------+
1 row in set (0.73 sec)

mysql> select count(distinct id) from source;
+--------------------+
| count(distinct id) |
+--------------------+
|              22836 |
+--------------------+
1 row in set (0.08 sec)
Sanco
  • 13
  • 4
  • what's your schema like? did you run your query with explain? – SMA Dec 30 '15 at 16:26
  • 1
    To know for certain we would need to see table structure, indexes and an explain plan. Count distinct id returns exactly that a distinct count of a field. count(id) will return a count of duplicates if they exist. What's odd about your results is the distinct count is > than count which I would NEVER expect if no data changes occurred. If ID is indexed, it's a simple matter of the engine to count the unique values within the idx where as if it's indexed and has duplicates, it may be having to perform a full table scan. and no you shouldn't always use one, use what you need in your results – xQbert Dec 30 '15 at 16:28
  • It looks to me like you're results are not paired correctly with the SQL used. If they are paired correctly, I can't fathom why count(ID) would ever be > count (Distinct ID) on the same data source. (unless table stats or indexes are corrupt) – xQbert Dec 30 '15 at 16:30

3 Answers3

1

If the column is indexed, COUNT(DISTINCT id) just needs to return the number of items in the index for the column. COUNT(id) has to add up the number of rows that each index entry points to, or scan all the rows.

For your second question, see count(*) and count(column_name), what's the diff?. Most of the time, COUNT(*) is most appropriate; there are some situations, such as counting rows joined with an outer join, where you need to use COUNT(columnname) because you don't want to count the null rows.

Community
  • 1
  • 1
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

It may also be faster if it the query is cached by the mysql

This is my test with about 1.5 million rows, and id is the auto_increment PK

enter image description here

the_nuts
  • 5,634
  • 1
  • 36
  • 68
0

1) Make sure query result is not cached

2) Seems like ID column has NULL params and index. In that case count(id) give count for id with NOT NULL values . If column ID has no NULL param - use COUNT(*). It give you rows count without checking "column !== null" for each row

Vitaly
  • 31
  • 3