count(column_name)
is not inaccurate, it's simply something completely different than count(*).
The SQL standard defines count(column_name)
as equivalent to count(*) where column_name IS NOT NULL
. To the result is bound to be different if column_name is nullable.
In Oracle (and possibly other DBMS as well), count(*)
will use an available index on a not null column to count the rows (e.g. PK index). So it will be just as fas
Additionally there is nothing similar to the rowid in SQL Server or MySQL (in PostgreSQL it would be ctid
).
Do use count(*)
. It's the best option to get the row count. Let the DBMS do any optimization in the background if adequate indexes are available.
Edit
A quick demo on how Oracle automatically uses an index if available and how that reduces the amount of work done by the database:
The setup of the test table:
create table foo (id integer not null, c1 varchar(2000), c2 varchar(2000));
insert into foo (id, c1, c2)
select lvl, c1, c1 from
(
select level as lvl, dbms_random.string('A', 2000) as c1
from dual
connect by level < 10000
);
That generates 10000 rows with each row filling up some space in order to make sure the table has a realistic size.
Now in SQL*Plus I run the following:
SQL> set autotrace traceonly explain statistics;
SQL> select count(*) from foo;
Execution Plan
----------------------------------------------------------
Plan hash value: 1342139204
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2740 (1)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| FOO | 9999 | 2740 (1)| 00:00:33 |
-------------------------------------------------------------------
Statistics
----------------------------------------------------------
181 recursive calls
0 db block gets
10130 consistent gets
0 physical reads
0 redo size
430 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
As you can see a full table scan is done on the table which requires 10130 "IO Operations" (I know that that is not the right term, but for the sake of the demo it should be a good enough explanation for someone never seen this before)
Now I create an index on that column and run the count(*) again:
SQL> create index i1 on foo (id);
Index created.
SQL> select count(*) from foo;
Execution Plan
----------------------------------------------------------
Plan hash value: 129980005
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| I1 | 9999 | 7 (0)| 00:00:01 |
----------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
27 consistent gets
21 physical reads
0 redo size
430 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
As you can see Oracle did use the index on the (not null!) column and the amount of IO went drastically down (from 10130 to 27 - not something I'd call "grossly ineffecient").
The "physical reads" stem from the fact that the index was just created and was not yet in the cache.
I would expect other DBMS to apply the same optimizations.