3

I have a large database table. I need to to know exactly how many rows the table contains in very short time. The user will request the number everytime, on every page of my website. Its not a rare request.

I have following methods:

  1. Using primary key, so by using MAX(id) I would already have the row-count. But it is O(n). On my table with > 1mio rows its slow.
  2. I could increase a number in a text file each time an Item gets inserted
  3. I could switch to MyISAM since it saves the row-COUNT, but problem is its slower in selecting. In my app I mostly select large amounts.
  4. SELECT COUNT(*) FROM table / SELECT * FROM table ORDER BY id DESC LIMIT 1; But they are very slow or messy for very large database tables.

Which method should I use, or are there better methods?

Volkan Y
  • 119
  • 1
  • 8
  • 1
    An *approximate* count can be had from [information_schema.tables](https://stackoverflow.com/a/33226386/14660). – Schwern May 23 '20 at 19:56

2 Answers2

3

If you need perfect accuracy, select count(id) from table or select count(1) from table is the only way. And I'd recommend against switching to MyISAM.

If you can stand an approximation, table_rows from information_schema.tables. I find this sufficient for UI purposes such as "Showing 40 of 10393".

Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40% to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.

select table_rows
from information_schema.tables
where tables.table_schema = 'your database name'
  and tables.table_name = 'your table';

Finally, you can cache the value from select count(id) and periodically update it. Whether this will be more accurate than information_schema.tables I can't say. You'll have to do some testing. Caching can greatly speed up count queries which also have a filter like select count(id) from table where this = 'that'.


You might be tempted to use the auto_increment value from information_schema.tables. This can only provide you with an upper bound. It will always over-estimate the number of rows because...

  • Rows get deleted.
  • The auto_increment value can advance without inserting anything.
  • Manually inserting an id will move the auto_increment value to id + 1.

For example, insert into select will do this, as will inserting and rolling back.

Schwern
  • 153,029
  • 25
  • 195
  • 336
1

If you never delete rows from the table and you have an auto_increment PK and auto_increment increment/offset of 1, do:

SHOW CREATE TABLE table_name;

and parse the auto_increment value out of that. Every other way will be either slow (count()/max()) or an approximation (SHOW INDEX FROM table_name; or from (information_schema.TABLES).

Caveat: The SHOW CREATE TABLE will not be accurate if you delete rows from the table or have INSERT+ROLLBACK or experience deadlocks against INSERTs (that will allocate auto_increment values but not add actual records.

Gordan Bobić
  • 1,748
  • 13
  • 16
  • No need to parse the `create table`, you can get the next `auto_increment` value from [`information_schema.tables`](https://dev.mysql.com/doc/refman/8.0/en/tables-table.html). However, this become less and less accurate as rows are deleted. It's also easy for auto_increment to advance without inserting a row. `table_rows` would be more accurate. – Schwern May 23 '20 at 20:00
  • `information_schema.TABLES.TABLE_ROWS` contains the same approximation for PK cardinality as `SHOW INDEX FROM table_name`. It is an approximation based on an index dive sample. The auto_increment value from `SHOW CREATE TABLE` will be accurate, unless you delete some rows or do INSERT+ROLLBACK or get deadlocks with INSERTs. – Gordan Bobić May 23 '20 at 20:07
  • `information_schema.tables.auto_increment` should be the same value from `show create table`, no parsing necessary. `auto_increment` is only an upper bound. – Schwern May 23 '20 at 20:13