10

I'm looking for the best way to count how many rows there are in a large (15 million+ rows) table. The naive way of select count(*) from table; is apparently O(n) according to a few older posts I've found on the matter, e.g. http://osdir.com/ml/sqlite-users/2010-07/msg00437.html.

Is there a constant time mechanism to get this information, or failing that are there preferred alternatives to the straightforward select count(*) query?

dlanod
  • 8,664
  • 8
  • 54
  • 96
  • How can it be `O(n)` if the documentation states it uses a B Tree?[Official documentation](http://www.sqlite.org/arch.html) – Mosty Mostacho Sep 15 '13 at 01:28
  • 1
    @MostyMostacho how can you visit *every* node of a tree in less than `O(n)` time? – hobbs Sep 15 '13 at 02:06
  • You just let every node have a count of the subtree it is parent of as you can see in this [link](http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html). Anyway, if the count is not returning immediately then it seems SQLite is using a very simple B Tree implementation. – Mosty Mostacho Sep 15 '13 at 05:35
  • @MostyMostacho That would require an upwards traversal of the b-tree for every insert, which would use more IO, probably cause a bunch of disk seeks, and generally slow things down. – Colonel Thirty Two Sep 15 '13 at 16:18

6 Answers6

5

SQLite has a special optimization for COUNT(*) without a WHERE clause, where it goes through the table's B-tree pages and counts entries without actually loading the records. However, this still requires that all the table's data (except overflow pages for large records) is visited, so the runtime is still O(n).

SQLite does not store a separate record count in the database because that would make all changes slower.

CL.
  • 173,858
  • 17
  • 217
  • 259
3

as a workaround you could query ROWID. if you don't delete from the table it'll be accurate otherwise it will be high

select max(rowid) from table
gordy
  • 9,360
  • 1
  • 31
  • 43
  • I have a table with ~900M rows. select count(*) took 28.5 seconds. select max(rowid) took 0.003 seconds. Same answer. – Marc Meketon Sep 15 '22 at 04:46
3

No, it is not constant time.

sqlite> CREATE TABLE test ( a );
sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM test;
0|0|0|SCAN TABLE test (~1000000 rows)
sqlite> EXPLAIN QUERY PLAN SELECT COUNT(1) FROM test;
0|0|0|SCAN TABLE test (~1000000 rows)

You can use EXPLAIN QUERY PLAN SELECT ... to get an idea of the performance of a query.

Colonel Thirty Two
  • 23,953
  • 8
  • 45
  • 85
1

It's a great question. I wish there was a catalog for SQLite containing tables' row count. select count(*) from table; is your best bet O(n). You could check performance of select count(1) from table; in comparison to count(*). I speculate that both count(1) and count( * ) will give you similar speeds. Unfortunately in SQLite, there aren't any alternatives to count(*) to get row count.

SQL Server, on the other hand, has sys.dm_db_partition_stats that can be really helpful.

Community
  • 1
  • 1
zedfoxus
  • 35,121
  • 5
  • 64
  • 63
1

While agreeing with the other answers that it is not constant time, one interesting and non-obvious performance improvement for select count(*) is to add an index if you don't have one. This can be on any arbitrary column, and on my system reduces the time of the query by 75% (ish).

sqlite> select count(*) from TestTable;
15035000
CPU Time: user 0.468003 sys 4.368028

sqlite> select count(*) from TestTable;
15035000
CPU Time: user 0.561604 sys 4.290027

sqlite> select count(*) from TestTable;
15035000
CPU Time: user 0.483603 sys 4.368028

sqlite> explain query plan select count(*) from TestTable;
0|0|0|SCAN TABLE TestTable (~1000000 rows)

sqlite> create index test_index on TestTable(Pointer);

sqlite> select count(*) from TestTable;
15035000
CPU Time: user 0.062400 sys 0.780005

sqlite> select count(*) from TestTable;
15035000
CPU Time: user 0.187201 sys 0.655204

sqlite> select count(*) from TestTable;
15035000
CPU Time: user 0.140401 sys 0.748805

sqlite> explain query plan select count(*) from TestTable;
0|0|0|SCAN TABLE TestTable USING COVERING INDEX test_index(~1000000 rows)
dlanod
  • 8,664
  • 8
  • 54
  • 96
  • it is weird, because there is an index for the RowID primary key, isn't it? or perhaps it is not a true physical index, but how the BTree works and stores its IDs... nice finding! – Arnaud Bouchez Feb 09 '18 at 10:10
0

I believe you can use the sqlite_stat1 table to retrieve the number of rows in your table after having run ANALYZE table:

The first integer in this list is the number of rows in the index and in the table.

The statistics in this table do not update along with your data, so they become less accurate as the table changes. How useful this is depends on your use case.

It is probable that ANALYZE takes the same amount of time as COUNT(*), however it does the work of caching the results (and some other statistics) for you.

Chris Bandy
  • 1,498
  • 13
  • 7