What difference does it make which column SQL COUNT() is run on?

Question

Firstly, this is not asking In SQL, what's the difference between count(column) and count(*)?.

Say I have a users table with a primary key user_id and another field logged_in which describes if the user is logged in right now.

Is there a difference between running

SELECT COUNT(user_id) FROM users WHERE logged_in=1

and

SELECT COUNT(logged_in) FROM users WHERE logged_in=1

to see how many users are marked as logged in? Maybe a difference with indexes?

I'm running MySQL if there are DB-specific nuances to this.

score 12 · Accepted Answer · edited May 23 '17 at 12:18

12

In MySQL, the count function will not count null expressions, so the results of your two queries may be different. As mentioned in the comments and Remus' answer, this is as a general rule for SQL and part of the spec.

For example, consider this data:

user_id   logged_in
1         1
null      1

SELECT COUNT(user_id) on this table will return 1, but SELECT COUNT(logged_in) will return 2.

As a practical matter, the results from the example in the question ought to always be the same, as long as the table is properly constructed, but the utilized indexes and query plans may differ, even though the results will be the same. Additionally, if that's a simplified example, counting on different columns may change the results as well.

See also this question: MySQL COUNT() and nulls

edited May 23 '17 at 12:18

Community

1
1

answered Jan 17 '13 at 21:13

Esoteric Screen Name

6,082
4
29
38

Good answer... providing a concrete example and a related link. – Mr. Boy Jan 17 '13 at 21:18
2

I think it it worth mentioning that is a generic rule of SQL not just your MySQL – Bulat Jan 17 '13 at 21:18
am i the only one who thinks it is not what he asked about ? as he stated it in first line – WKordos Jan 17 '13 at 21:19
@WKordos I don't understand, would you please elaborate? – Esoteric Screen Name Jan 17 '13 at 21:19
I am happy it addresses the main part of the question; I do wonder if there are any issues with performance between counting on e.g. a BOOL Vs string column or anything? – Mr. Boy Jan 17 '13 at 21:24
+1 this is the most important gotcha part from my answer too, and you did post it immediately :) – Remus Rusanu Jan 17 '13 at 21:35
@John The main performance concern would be indexes. An index on a boolean column won't be useful (or exist in a well maintained DB), because it's not meaningfully selective. I don't know of any significant performance issues that would be directly caused by different data types when individually counting two columns, since count only cares whether or not the expression is null, but that of course doesn't mean that none exist. – Esoteric Screen Name Jan 17 '13 at 21:45

Remus Rusanu · Answer 2 · 2013-01-17T21:29:53.390

For the record: the two queries return different results. As the spec says:

Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement.

You may argue that given the condition for logged_in=1 the NULL logged_in rows are filtered out anyway, and user_id will not have NULLs in a table users. While this may be true, it does not change the fundamentals that the queries are different. You are asking the query optimizer to make all the logical deductions above, for you they may be obvious but for the optimizer may be is not.

Now, assuming that the results are in practice always identical between the two, the answer is simple: don't run such a query in production (and I mean either of them). Is a scan, no matter how you slice it. logged_in has too low cardinality to matter. Keep a counter, update it at each log in and each log out event. It will drift in time, refresh as often as needed (once a day, once an hour).

As for the question itself: SELECT COUNT(somefield) FROM sometable can use a narrow index on somefield resulting in less IO. The recommendation is to use * because this room for the optimizer to use any index it sees fit (this will vary from product to product though, depending on how smart a query optimizer are we dealing with, YMMV). But as you start adding WHERE clauses the possibile alternatives (=indexes to use) quickly vanish.

What difference does it make which column SQL COUNT() is run on?

2 Answers2