0

We assume that there is no primary key defined for a table T. In that case, how does one count all the rows in T quickly/efficiently for these databases - Oracle 11g, MySql, Mssql ?

It seems that count(*) and count(column_name) can be slow and inaccurate respectively. The following seems to be the fastest and most reliable way to do it-

select count(rowid) from MySchema.TableInMySchema;

Can you tell me if the above statement also has any shortcomings ? If it is good, then do we have similar statements for mysql and mssql ?

Thanks in advance.

Source - http://www.thewellroundedgeek.com/2007/09/most-people-use-oracle-count-function.html

bread butter
  • 617
  • 3
  • 10
  • 18
  • http://stackoverflow.com/questions/1221559/count-vs-count1 - just use `count(*)` – Mat Aug 26 '12 at 07:59

3 Answers3

3

count(column_name) is not inaccurate, it's simply something completely different than count(*).

The SQL standard defines count(column_name) as equivalent to count(*) where column_name IS NOT NULL. To the result is bound to be different if column_name is nullable.

In Oracle (and possibly other DBMS as well), count(*) will use an available index on a not null column to count the rows (e.g. PK index). So it will be just as fas

Additionally there is nothing similar to the rowid in SQL Server or MySQL (in PostgreSQL it would be ctid).

Do use count(*). It's the best option to get the row count. Let the DBMS do any optimization in the background if adequate indexes are available.

Edit

A quick demo on how Oracle automatically uses an index if available and how that reduces the amount of work done by the database:

The setup of the test table:

create table foo (id integer not null, c1 varchar(2000), c2 varchar(2000));
insert into foo (id, c1, c2)
select lvl, c1, c1 from 
(
  select level as lvl, dbms_random.string('A', 2000) as c1
  from dual 
  connect by level < 10000
);

That generates 10000 rows with each row filling up some space in order to make sure the table has a realistic size.

Now in SQL*Plus I run the following:

SQL> set autotrace traceonly explain statistics;
SQL> select count(*) from foo;


Execution Plan
----------------------------------------------------------
Plan hash value: 1342139204

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |  2740   (1)| 00:00:33 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| FOO  |  9999 |  2740   (1)| 00:00:33 |
-------------------------------------------------------------------


Statistics
----------------------------------------------------------
        181  recursive calls
          0  db block gets
      10130  consistent gets
          0  physical reads
          0  redo size
        430  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

As you can see a full table scan is done on the table which requires 10130 "IO Operations" (I know that that is not the right term, but for the sake of the demo it should be a good enough explanation for someone never seen this before)

Now I create an index on that column and run the count(*) again:

SQL> create index i1 on foo (id);

Index created.

SQL> select count(*) from foo;


Execution Plan
----------------------------------------------------------
Plan hash value: 129980005

----------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |      |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| I1   |  9999 |     7   (0)| 00:00:01 |
----------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         27  consistent gets
         21  physical reads
          0  redo size
        430  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

As you can see Oracle did use the index on the (not null!) column and the amount of IO went drastically down (from 10130 to 27 - not something I'd call "grossly ineffecient").

The "physical reads" stem from the fact that the index was just created and was not yet in the cache.

I would expect other DBMS to apply the same optimizations.

  • The link which i shared says this - select count(*) from my_table; While this does technically work, on larger tables it can be grossly inefficient to perform this query. Is this really true ? If so, how can one check if it is true or not ? – bread butter Aug 26 '12 at 08:05
  • @breadbutter: I don't know why it would "grossly ineffecient". Counting all rows in a table is not something that is going to be fast especially with large tables (as the work required is direct proportional to the number of rows). You can check the execution plan to see if it's using an index or not. –  Aug 26 '12 at 08:10
  • I am new to databases and dont know what this means - You can check the execution plan to see if it's using an index or not. Can you suggest where i read about it ? – bread butter Aug 26 '12 at 08:22
1

In Oracle, COUNT(*) is the most efficient. Realistically, COUNT(rowid), COUNT(1), or COUNT('fuzzy bunny') are likely to be equally efficient. But if there is a difference, COUNT(*) will be more efficient.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
0

i EVER use SELECT COUNT(1) FROM anything;, instead of the asterisk...

some people are of the opinion, that mysql uses the asterisk to invoke the query-optimizer and ignores any optimizing when use of "1" as static scalar...

imho, this is straight-forward, because you don't use any variable and it's clear, that you only count all rows.

TheHe
  • 2,933
  • 18
  • 22