1

This fairly obvious question has very few (couldnt find any) solid answers.

I do simple select from table of 2 million rows.

select count(id) as total from big_table

Any machine I try this query on, usually takes at least 5 seconds to complete. This is unacceptable for realtime queries.

The reason I need an exact value of rows fetched is for precise statistical calculations later on.

Using the last auto increment value is unfortunately not an options because rows also get deleted periodically.

Alfabravo
  • 7,493
  • 6
  • 46
  • 82
John Doe
  • 983
  • 4
  • 14
  • 27
  • Don't see the relation with mysqli and/or PHP. Edit your question to add relevant info or remove the tags. – Alfabravo Dec 29 '15 at 13:45
  • 3
    [This one](http://stackoverflow.com/questions/4871747/mysql-count-performance) has an accepted answer. No luck? – Alfabravo Dec 29 '15 at 14:06
  • 1
    @Alfabravo it seems best practice is to store the value in separate table and add triggers that update that value (InnoDB case). – John Doe Dec 29 '15 at 15:50

1 Answers1

7

It can indeed be slow when running on an InnoDB engine. As stated in section 14.24 of the MySQL 5.7 Reference Manual, “InnoDB Restrictions and Limitations”, 3rd bullet point:

InnoDB InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate Function Descriptions”.

The suggested solution is a counter table. This is a separate table with one row and column, having the current record count. It could be kept updated via triggers. Something like this:

create table big_table_count (rec_count int default 0);
-- one-shot initialisation:
insert into big_table_count select count(*) from big_table;

create trigger big_insert after insert on big_table
    for each row
    update big_table_count set rec_count = rec_count + 1;

create trigger big_delete after delete on big_table
    for each row
    update big_table_count set rec_count = rec_count - 1;

You can see here a fiddle, where you should alter the insert/delete statements in the build section to see the effect on:

select rec_count from big_table_count;

You could extend this for several tables, either by creating such a table for each, or to reserve a row per table in the above counter table. It would then be keyed by a column "table_name".

Improving concurrency

The above method does have an impact if you have many concurrent sessions inserting or deleting records, because they need to wait for each other to complete the update of the counter.

A solution is to not let the triggers update the same, single record, but to let them insert a new record, like this:

create trigger big_insert after insert on big_table
    for each row
    insert into big_table_count (rec_count) values (1);

create trigger big_delete after delete on big_table
    for each row
    insert into big_table_count (rec_count) values (-1);

The way to get the count then becomes:

select sum(rec_count) from big_table_count;

Then, once in a while (e.g. daily) you should re-initialise the counter table to keep it small:

truncate table big_table_count;
insert into big_table_count select count(*) from big_table;
iChux
  • 2,266
  • 22
  • 37
trincot
  • 317,000
  • 35
  • 244
  • 286