269

I often find these three variants:

SELECT COUNT(*) FROM Foo;
SELECT COUNT(1) FROM Foo;
SELECT COUNT(PrimaryKey) FROM Foo;

As far as I can see, they all do the same thing, and I find myself using the three in my codebase. However, I don't like to do the same thing different ways. To which one should I stick? Is any one of them better than the two others?

zneak
  • 134,922
  • 42
  • 253
  • 328
  • 22
    +1, I didn't even know, `SELECT COUNT(PrimaryKey) FROM Foo;` was even an option – Anthony Forloney Apr 26 '10 at 01:16
  • 23
    IMO, if you don't know the difference, pick one and stick with it. If you can't be right, at least be consistent. – Frank Farmer Apr 26 '10 at 01:18
  • 18
    @Anthony Forloney: let's make it clear that `PrimaryKey` refers to the name of your primary key field, and that it's not some magical keyword. – zneak Apr 26 '10 at 01:21
  • 7
    @zneak, Yeah, I realized that when MySQL threw me an error *Unknown column "primarykey" in 'field list'* good job me. – Anthony Forloney Apr 26 '10 at 01:24
  • 3
    @gbn: yeah it's possible duplicate. but not exact duplicate, the OP takes into account the COUNT(PrimaryKey) construct. so that made it not exact duplicate. it's a topic of its own, contrasting it with the other two approaches – Hao Apr 28 '10 at 02:25

5 Answers5

268

Bottom Line

Use either COUNT(field) or COUNT(*), and stick with it consistently, and if your database allows COUNT(tableHere) or COUNT(tableHere.*), use that.

In short, don't use COUNT(1) for anything. It's a one-trick pony, which rarely does what you want, and in those rare cases is equivalent to count(*)

Use count(*) for counting

Use * for all your queries that need to count everything, even for joins, use *

SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

But don't use COUNT(*) for LEFT joins, as that will return 1 even if the subordinate table doesn't match anything from parent table

SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Don't be fooled by those advising that when using * in COUNT, it fetches entire row from your table, saying that * is slow. The * on SELECT COUNT(*) and SELECT * has no bearing to each other, they are entirely different thing, they just share a common token, i.e. *.

An alternate syntax

In fact, if it is not permitted to name a field as same as its table name, RDBMS language designer could give COUNT(tableNameHere) the same semantics as COUNT(*). Example:

For counting rows we could have this:

SELECT COUNT(emp) FROM emp

And they could make it simpler:

SELECT COUNT() FROM emp

And for LEFT JOINs, we could have this:

SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

But they cannot do that (COUNT(tableNameHere)) since SQL standard permits naming a field with the same name as its table name:

CREATE TABLE fruit -- ORM-friendly name
(
fruit_id int NOT NULL,
fruit varchar(50), /* same name as table name, 
                and let's say, someone forgot to put NOT NULL */
shape varchar(50) NOT NULL,
color varchar(50) NOT NULL
)

Counting with null

And also, it is not a good practice to make a field nullable if its name matches the table name. Say you have values 'Banana', 'Apple', NULL, 'Pears' on fruit field. This will not count all rows, it will only yield 3, not 4

SELECT count(fruit) FROM fruit

Though some RDBMS do that sort of principle (for counting the table's rows, it accepts table name as COUNT's parameter), this will work in Postgresql (if there is no subordinate field in any of the two tables below, i.e. as long as there is no name conflict between field name and table name):

SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

But that could cause confusion later if we will add a subordinate field in the table, as it will count the field(which could be nullable), not the table rows.

So to be on the safe side, use:

SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

count(1): The one-trick pony

In particular to COUNT(1), it is a one-trick pony, it works well only on one table query:

SELECT COUNT(1) FROM tbl

But when you use joins, that trick won't work on multi-table queries without its semantics being confused, and in particular you cannot write:

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

So what's the meaning of COUNT(1) here?

SELECT boss.boss_id, COUNT(1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Is it this...?

-- counting all the subordinates only
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Or this...?

-- or is that COUNT(1) will also count 1 for boss regardless if boss has a subordinate
SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

By careful thought, you can infer that COUNT(1) is the same as COUNT(*), regardless of type of join. But for LEFT JOINs result, we cannot mold COUNT(1) to work as: COUNT(subordinate.boss_id), COUNT(subordinate.*)

So just use either of the following:

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Works on Postgresql, it's clear that you want to count the cardinality of the set

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Another way to count the cardinality of the set, very English-like (just don't make a column with a name same as its table name) : http://www.sqlfiddle.com/#!1/98515/7

select boss.boss_name, count(subordinate)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

You cannot do this: http://www.sqlfiddle.com/#!1/98515/8

select boss.boss_name, count(subordinate.1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

You can do this, but this produces wrong result: http://www.sqlfiddle.com/#!1/98515/9

select boss.boss_name, count(1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name
jpaugh
  • 6,634
  • 4
  • 38
  • 90
Michael Buen
  • 38,643
  • 9
  • 94
  • 118
  • Thanks for the advice. Are there reasons behind those imperatives? – zneak Apr 26 '10 at 01:54
  • Imperatives regarding using `*` consistently? Yes there is, to make things simpler and consistent. In fact if SQL standard ruled that a field name could not have the same name as its table name, there could be only two forms of COUNT, one is `COUNT()` and the other is `COUNT(tableOrAliasedTablenameHere)`, life could be simpler. If we think about it, we don't count field, we count rows. And if SQL standard ruled that there could be no nullable fields in the database, we really won't need `COUNT(fieldnameHere)` construct at all. – Michael Buen Apr 26 '10 at 02:13
  • If it's only for the sake of consistency, then you could also use COUNT(1) instead of COUNT(*). Is one of them better than the other? – zneak Apr 26 '10 at 02:53
  • 4
    COUNT(1) looks like a magic number, one that is used when someone already have a grasp what is going on under-the-hood. It could led to abuse (i.e. if there's a malicious intention), since all of COUNT(0), COUNT(1), COUNT(2), COUNT(42) (you get the gist) are the same as COUNT(`*`), somebody could obfuscate the code and use COUNT(2) for example, so the next maintainer could have a hard time deducing what those COUNTs do. Someone will only start to use COUNT(1) when he/she already gleans that COUNT(1) is same as COUNT(`*`). Nobody started their database career with COUNT(1) – Michael Buen Apr 26 '10 at 03:07
  • If for consistency's sake, COUNT(*) is better, it doesn't has any arcane semantics from it as compared to COUNT(1). Everybody starts with `SELECT * FROM tbl`. And it naturally occurs to them(and rest of us) that when they need to get that query's count, they will just enclose COUNT on *. It won't be a first nature to us to use COUNT(1) – Michael Buen Apr 26 '10 at 03:18
  • +1 for pointing out that `COUNT(*)` and `*` are very different. Unless there's a WHERE clause filtering the result set, `COUNT(*)` will often be processed by the DBMS looking at the metadata it maintains for the number of rows in the table - never reading a row from the table. This does depend on the DBMS optimizer, but it is such a massive benefit that they all (or is it just 'almost all'?) do it. – Jonathan Leffler Apr 26 '10 at 04:22
  • yeah you are right about the count metadata. `SELECT COUNT(*) FROM tbl` will fetch metadata for the table's rows count, if the database isn't of MVCC type, notably MySQL's MyISAM, COUNT(*) is indeed very fast. On 'almost all', the database can't store row count metadata if it implements MVCC, e.g. MySQL InnoDB, Postgresql, Sql Server 2005(and up, though have to explicitly set the database as MVCC type) – Michael Buen Apr 26 '10 at 05:19
  • 5
    or from jester programmers, they could do: `SELECT COUNT('ME IN') FROM tbl`, for the thinking that like 1 in `COUNT(1)`, 'ME IN' will be ignored and optimized by the RDBMS also – Hao Apr 28 '10 at 02:30
  • Count(1) is counting the number of entries. For instance, "Select 1 from users" returns the number 1 in each row, for the numbe of rows in table users. Sometimes, depending on how the optimizer works, this could be optimized better, as the system can decide to use an index, and simply count nodes, instead of needing to could elements, which might require it to use the full table. – David Manheim May 04 '12 at 18:49
  • @DavidManheim `COUNT(*)` doesn't count elements, it sees past that. And `COUNT(1)` is a one-trick pony, cannot use it on LEFT JOINs http://www.ienablemuch.com/2010/04/debunking-myth-that-countdracula-is.html – Michael Buen May 05 '12 at 23:09
  • And regarding `SELECT 1` http://www.ienablemuch.com/2010/05/why-is-exists-select-1-cargo-cult.html – Michael Buen May 05 '12 at 23:13
  • @MichaelBuen I think that you are oversimplifying, and possibly off base. First, Count(1) does work for left joins in MSSQL - I just tried it. Second, comparing queries with no other differences, MSSMS tells me the count(1) has a lower cost for many complex queries than counting a specific column, it seems specifically in cases where the column is not listed first in the index that the query uses. – David Manheim May 07 '12 at 12:52
  • 2
    Of course it "does work", the question does it work **properly**? If John has two subordinates George and Ringo, and Paul don't have any, try to fashion `COUNT(1)` to `LEFT JOIN` that it will work properly, so Paul's subordinate count will be 0. Solve this first: http://www.sqlfiddle.com/#!1/98515/13 – Michael Buen May 07 '12 at 14:22
  • 2
    I emphasized this statement on my answer regarding using `COUNT(1)` on `LEFT JOIN`: **You can do this, but this produces wrong result**. Search this phrase on this page: **wrong result** – Michael Buen May 07 '12 at 14:26
  • You say `when you use joins, that trick [meaning count(1)] won't work on multi-table queries without its semantics being confused,` but it sounds like you're only talking about outer joins. Am I correct in thinking `count(1)` works just fine on inner joins? – Justin Morgan - On strike Aug 20 '15 at 13:07
  • @JustinMorgan yes count(1) works just fine on inner joins. for outer joins, count(1) can give incorrect result – Michael Buen Aug 21 '15 at 00:43
  • 1
    @MichaelBuen Very informative! But you seemed to always put your most convincing argument at the bottom of a section of text. I tried to change it to the pattern of: (1) Controversial assertion to grab attention, (2) Back it up with facts and examples. The section on syntax is interesting by itself, but almost irrelevant to the main point. I'd move it to the bottom, but I can't without a big rewrite. Again,very useful, thanks! – jpaugh Jun 06 '17 at 14:49
  • 1
    The very first sentence of this answer is already wrong! *"Use either COUNT(field) or COUNT(*), and stick with it consistently"* They are not the same thing. `COUNT(*)` and `COUNT(1)` are the same, but when you put a non-constant expression there, they're not. – Lukas Eder Sep 19 '19 at 12:25
62

Two of them always produce the same answer:

  • COUNT(*) counts the number of rows
  • COUNT(1) also counts the number of rows

Assuming the pk is a primary key and that no nulls are allowed in the values, then

  • COUNT(pk) also counts the number of rows

However, if pk is not constrained to be not null, then it produces a different answer:

  • COUNT(possibly_null) counts the number of rows with non-null values in the column possibly_null.

  • COUNT(DISTINCT pk) also counts the number of rows (because a primary key does not allow duplicates).

  • COUNT(DISTINCT possibly_null_or_dup) counts the number of distinct non-null values in the column possibly_null_or_dup.

  • COUNT(DISTINCT possibly_duplicated) counts the number of distinct (necessarily non-null) values in the column possibly_duplicated when that has the NOT NULL clause on it.

Normally, I write COUNT(*); it is the original recommended notation for SQL. Similarly, with the EXISTS clause, I normally write WHERE EXISTS(SELECT * FROM ...) because that was the original recommend notation. There should be no benefit to the alternatives; the optimizer should see through the more obscure notations.

Piotr Dobrogost
  • 41,292
  • 40
  • 236
  • 366
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 2
    I didn't even know `COUNT(DISTINCT)` worked, though it makes sense. Is it specific to a SQL flavor, or it's widely supported? – zneak Apr 26 '10 at 01:36
  • 2
    @zneak: COUNT(DISTINCT x) has been in SQL since SQL-86 (the first standard), so I would be surprised to find any SQL DBMS that did not support it. – Jonathan Leffler Apr 26 '10 at 01:40
9

Asked and answered before...

Books on line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"

"1" is a non-null expression so it's the same as COUNT(*). The optimiser recognises it as trivial so gives the same plan. A PK is unique and non-null (in SQL Server at least) so COUNT(PK) = COUNT(*)

This is a similar myth to EXISTS (SELECT * ... or EXISTS (SELECT 1 ...

And see the ANSI 92 spec, section 6.5, General Rules, case 1

        a) If COUNT(*) is specified, then the result is the cardinality
          of T.

        b) Otherwise, let TX be the single-column table that is the
          result of applying the <value expression> to each row of T
          and eliminating null values. If one or more null values are
          eliminated, then a completion condition is raised: warning-
          null value eliminated in set function.
Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
6

At least on Oracle they are all the same: http://www.oracledba.co.uk/tips/count_speed.htm

ZeissS
  • 11,867
  • 4
  • 35
  • 50
3

I feel the performance characteristics change from one DBMS to another. It's all on how they choose to implement it. Since I have worked extensively on Oracle, I'll tell from that perspective.

COUNT(*) - Fetches entire row into result set before passing on to the count function, count function will aggregate 1 if the row is not null

COUNT(1) - Will not fetch any row, instead count is called with a constant value of 1 for each row in the table when the WHERE matches.

COUNT(PK) - The PK in Oracle is indexed. This means Oracle has to read only the index. Normally one row in the index B+ tree is many times smaller than the actual row. So considering the disk IOPS rate, Oracle can fetch many times more rows from Index with a single block transfer as compared to entire row. This leads to higher throughput of the query.

From this you can see the first count is the slowest and the last count is the fastest in Oracle.

Paul
  • 4,160
  • 3
  • 30
  • 56
arunmur
  • 650
  • 4
  • 5
  • 1
    Fortunately they have been sensible enough to change that after you left - http://www.oracledba.co.uk/tips/count_speed.htm – OrangeDog Feb 17 '11 at 11:20
  • I have my doubt on COUNT(*) fetch the entire row. I don't think the Oracle engine is dumber than SQL... – Sam Sep 06 '22 at 03:29