12

I know they return different results (the first counts nulls, the latter not). That's not my question. Just imagine a case where I don't care (either because there are no nulls, or because there are only a few and I only want a general sense of the number of rows in the database).

My question is about the following (presumable) contradiction:

Here one of the highest rep users in the SQL tag says

Your use of COUNT(*) or COUNT(column) should be based on the desired output only.

On the other hand, here is a 47 times upvoted comment saying

... if you have a non-nullable column such as ID, then count(ID) will significantly improve performance over count(*).

The two seem to contradict each other. So can someone please explain to me why is whatever the correct one correct?

ispiro
  • 26,556
  • 38
  • 136
  • 291
  • Have you benchmarked this? – Terry Carmen Sep 17 '18 at 20:36
  • There are ample resources on the web that answer this question. Two I quickly stumbled on: https://www.mssqltips.com/sqlservertip/4460/sql-server-count-function-performance-comparison/, https://www.sqlskills.com/blogs/paul/which-index-will-sql-server-use-to-count-all-rows/. @gbn is correct. – Gordon Linoff Sep 17 '18 at 20:40
  • 7
    Both incorrect comments and incorrect answers can be upvoted -- but incorrect comments can't be *downvoted*, so the votes on those need to be taken with far more grains of salt. Bandwagon voting is very common, while checking if what someone says is actually true is (perhaps surprisingly) not. – Jeroen Mostert Sep 17 '18 at 20:43
  • @JeroenMostert That's a good point. Might help me in the future too. – ispiro Sep 17 '18 at 20:46
  • Interestingly you can write COUNT() and get the same result as COUNT(*). e.g you can even write COUNT('NULL') where 'NULL' is the literal string not the NULL value. – user1443098 Sep 17 '18 at 20:50
  • Want some real fun? The count function works a bit like exists. You can actually do something like count(1/0) and it will return the count perfectly fine. Check this out. select count(1/0) from sys.databases This is a pretty clear indication it is counting the rows in the index. – Sean Lange Sep 17 '18 at 20:58
  • I would bandwagon upvote @JeroenMostert but he is absolutely correct. – Sean Lange Sep 17 '18 at 20:59

2 Answers2

12

Expanding on scsimon's answer, I ran a test of 10 million rows on SQL Server 2017. Performance results are at the bottom.

EDIT

With suggestions from Jeroen's comment, I added more columns to the test table: an int identity, a bigint not null, and a nullable tinyint. In each case, the select count query chose a scan of the index on the tinyint column. The estimated cost was identical between the queries, but the select count(NullableColumn) was noticeably slower than other select count methods.

Code

if object_id('tempdb..#table') is null
begin
    declare @Count bigint = 10000000

    create table #Table
    (
        id int identity primary key clustered,
        BigID bigint not null,
        TinyC tinyint null
    );

    WITH
        E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
        E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
        E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
        E16(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+16 or 10,000,000,000,000,000 rows max
        cteTally(N) AS 
        (
            SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E16
        )
    insert into #table (BigId, TinyC)
    select N, case when N % 2 = 0 then null else N % 8 end
    from cteTally
    where N <= @Count;

    create unique index IX_BigID on #table (BigID);
    create index IX_TinyC on #table (TinyC);
end

set statistics io on
set statistics time on

print 'count(*)'
select count(*) from #table
option (maxdop 1)

print 'count(ID)'
select count(ID) from #table
option (maxdop 1)

print 'count(BigID)'
select count(BigID) from #table
option (maxdop 1)

print 'count(TinyC)'
select count(TinyC) from #table
option (maxdop 1)

set statistics io off
set statistics time off

--drop table #table

Performance

count(*)

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 735 ms, elapsed time = 746 ms

count(ID) -- int identity primary key clustered

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 765 ms, elapsed time = 776 ms

count(BigID) -- bigint not null, indexed

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 735 ms, elapsed time = 731 ms

count(TinyC) -- tinyint nullable, indexed

Warning: Null value is eliminated by an aggregate or other SET operation.

Table '#Table'. Scan count 1, logical reads 13617, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 1593 ms, elapsed time = 1584 ms.

Community
  • 1
  • 1
Paul Williams
  • 16,585
  • 5
  • 47
  • 82
  • 2
    An even better test (in the sense of showing all the variants) would be a table of `ID INT IDENTITY PRIMARY KEY, BigID BIGINT NOT NULL, TinyC TINYINT NULL`, and index both `BigID` and `TinyC` separately. Expected results: both `COUNT(ID)` and `COUNT(*)` are satisfied using the index on `BigID`, while `COUNT(TinyC)` is satisfied using the index on `TinyC`, and is the costliest one. If the index on `BigID` is dropped, `COUNT(*)` is satisfied using the clustered PK index. Disclaimer: none of this I've verified myself. If I'm wrong, I'm wrong, but it'll still be interesting. – Jeroen Mostert Sep 17 '18 at 21:26
  • 1
    @JeroenMostert A `tinyint` column would have many duplicate values with millions of rows, but we can do that. Editing post with results now. – Paul Williams Sep 17 '18 at 21:45
  • 1
    I actually imagined the column with all-`NULL` values (that is, *all* duplicates), so that's OK. In retrospect, I do feel dumb for not realizing it would be the smallest covering index even for the non-`NULL` counts -- of course it is, all it has to do is scan it without reading the values in the leaf nodes! – Jeroen Mostert Sep 17 '18 at 22:03
4

I would expect count(column) to be a bit slower, but that wasn't the case in a quick test (below) but what's more important is the first link you posted... since count(*) and count(column) can produce different results based on the column being null-able. Also, I'm assuming you aren't returning any other columns and thus removing tests which would be unique to your environment and indexes.

WITH
    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS 
    (
        SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    )
select  N
into #table
from cteTally


update #table
set N = null
where N % 2 = 0

select count(*) from #table
select count(N) from #table

enter image description here

Similarly, on a table with 361,912 rows, here are the results for count(*), count(pk_column), and count(nullable_column) which isn't part of an index:

enter image description here

S3S
  • 24,809
  • 5
  • 26
  • 45
  • 1
    . . Without an index of any sort, both have to scan the entire table, so this is not surprising. The performance difference arises when `count(*)` can use an index. Based on my understanding, it will use the most efficient index available. – Gordon Linoff Sep 17 '18 at 20:43
  • 1
    @GordonLinoff i added an example with an indexed table. – S3S Sep 17 '18 at 20:45
  • 1
    @GordonLinoff Your comment here confuses me. According to this, your comment to my question should have been `both are wrong. * is better (or same)`. – ispiro Sep 17 '18 at 20:45
  • From my test, they are the same @GordonLinoff – S3S Sep 17 '18 at 20:46
  • Are there any other indexes on non-nullable columns on `PatientAudit`? If not, this result is not surprising. If there are, it is (slightly). – Jeroen Mostert Sep 17 '18 at 20:59
  • 2
    Highly recommend running the `count` queries with set `statistics io on;` and `set statistics time on;` to verify if the actual CPU time and logical reads performed in each case are the same. – Paul Williams Sep 17 '18 at 21:01
  • 2
    Good, my world is sane again. Because, like Gordon (but presently with no access to a setup to test it) I do believe the optimizer will implement `SELECT COUNT(*)` by scanning the smallest non-filtered index, which is almost never the clustered index. And it should optimize `SELECT COUNT([non-nullable column])` the same way, but again, no way to test that currently. Only `SELECT COUNT([nullable column])` is the odd one out, due to the requirement to actually read what's in `[nullable column]` for every matching row. – Jeroen Mostert Sep 17 '18 at 21:02
  • For fun throw in count(1/0) in the mix. :) – Sean Lange Sep 17 '18 at 21:02
  • @JeroenMostert that's fair – S3S Sep 17 '18 at 21:07
  • Interesting... on my SQL 2017 machine, I am getting `select count(NullableColumn)` to be about 2x slower in terms of CPU time and elapsed time than `select count(*)`. – Paul Williams Sep 17 '18 at 21:12