14

I have the following code:

SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;

Is there any difference to the results or performance if I replace the COUNT(*) with COUNT('x')?

(This question is related to a previous one)

Community
  • 1
  • 1
Andrew
  • 12,991
  • 15
  • 55
  • 85
  • I believe this one has been answered in: https://stackoverflow.com/questions/59294/in-sql-whats-the-difference-between-countcolumn-and-count – Lars A. Brekken Sep 12 '08 at 15:36
  • That's very similar (and may indeed be the same answer), but I wondered if there is a difference between referencing a specific column (i.e. COUNT(column)) compared to referencing an arbitrary string (i.e. COUNT('x')). – Andrew Sep 12 '08 at 16:34

4 Answers4

18

To say that SELECT COUNT(*) vs COUNT(1) results in your DBMS returning "columns" is pure bunk. That may have been the case long, long ago but any self-respecting query optimizer will choose some fast method to count the rows in the table - there is NO performance difference between SELECT COUNT(*), COUNT(1), COUNT('this is a silly conversation')

Moreover, SELECT(1) vs SELECT(*) will NOT have any difference in INDEX usage -- most DBMS will actually optimize SELECT( n ) into SELECT(*) anyway. See the ASK TOM: Oracle has been optimizing SELECT(n) into SELECT(*) for the better part of a decade, if not longer: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1156151916789

problem is in count(col) to count() conversion **03/23/00 05:46 pm *** one workaround is to set event 10122 to turn off count(col) ->count() optimization. Another work around is to change the count(col) to count(), it means the same, when the col has a NOT NULL constraint. The bug number is 1215372.

One thing to note - if you are using COUNT(col) (don't!) and col is marked NULL, then it will actually have to count the number of occurrences in the table (either via index scan, histogram, etc. if they exist, or a full table scan otherwise).

Bottom line: if what you want is the count of rows in a table, use COUNT(*)

Matt Rogish
  • 24,435
  • 11
  • 76
  • 92
  • It's not correct to say there's not a difference between select(n) and select(*). If you have a covering index that includes n, you get the data straight from the leaf level of the index and don't have to go back to the table, which is much faster. – Eric Z Beard Sep 12 '08 at 16:24
  • 1
    The DBMS optimizer *will* realize this, and choose the correct index for the job. Provided there is an index, rare is the day that I've seen a DBMS actually **count** rows on the table. Moreover, the presence of NULLs often cause semantic bugs. When you want the # of rows in a table, use COUNT(*)!!! – Matt Rogish Sep 12 '08 at 16:28
3

The major performance difference is that COUNT(*) can be satisfied by examining the primary key on the table.

i.e. in the simple case below, the query will return immediately, without needing to examine any rows.

select count(*) from table

I'm not sure if the query optimizer in SQL Server will do so, but in the example above, if the column you are grouping on has an index the server should be able to satisfy the query without hitting the actual table at all.

To clarify: this answer refers specifically to SQL Server. I don't know how other DBMS products handle this.

Brannon
  • 25,687
  • 5
  • 39
  • 44
3

This question is slightly different that the other referenced. In the referenced question, it was asked what the difference was when using count(*) and count(SomeColumnName), and SQLMenace's answer was spot on.

To address this question, essentially there is no difference in the result. Both count(*) and count('x') and say count(1) will return the same number. The difference is that when using " * " just like in a SELECT all columns are returned, then counted. When a constant is used (e.g. 'x' or 1) then a row with one column is returned and then counted. The performance difference would be seen when " * " returns many columns.

Update: The above statement about performance is probably not quite right as discussed in other answers, but does apply to subselect queries when using EXISTS and NOT EXISTS

Community
  • 1
  • 1
NateSchneider
  • 205
  • 1
  • 2
  • 9
  • Does that mean COUNT('x') would be faster if the table had many columns, compared to COUNT(*)? – Andrew Sep 12 '08 at 16:07
  • I think this behavior depends on the database and the query optimization applied. It's an obvious optimization to perform when you see COUNT(*). It can only mean one thing, you want the total count of rows, regardless of how many columns the table has. – Brannon Sep 12 '08 at 16:11
1

MySQL: According to the MySQL website, COUNT(*) is faster for single table queries when using MyISAM:

http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count

I'm guessing with a having clause with a count in it may change things.

Darryl Hein
  • 142,451
  • 95
  • 218
  • 261