sql count results query with joins perfomance

Question

I have the following tables (example)

t1 (20.000 rows, 60 columns, primary key t1_id)
t2 (40.000 rows, 8 columns, primary key t2_id)
t3 (50.000 rows, 3 columns, primary key t3_id)
t4 (30.000 rows, 4 columns, primary key t4_id)

sql query:

SELECT COUNT(*) AS count FROM (t1)
JOIN t2 ON t1.t2_id = t2.t2_id
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id

I have created indexes on columns that affect the join (e.g on t1.t2_id) and foreign keys where necessary. The query is slow (600 ms) and if I put where clauses (e.g. WHERE t1.column10 = 1, where column10 doesn't have index), the query becomes much slower. The queries I do with select (*) and LIMIT are fast, and I can't understand count behaviour. Any solution?

EDIT: EXPLAIN SQL ADDED

id  select_type     table   type    possible_keys   key     key_len     ref  rows   Extra
1   SIMPLE          t4      index   PRIMARY     user_id     4           NULL  5259  Using index
1   SIMPLE          t2      ref     PRIMARY,t4_id   t4_id   4        t4.t4_id   1   Using index
1   SIMPLE          t1      ref     t2_id         t2_id     4        t2.t2_id   1   Using index
1   SIMPLE          t3      ref     PRIMARY     PRIMARY     4        t2.t2_id   1   Using index

where user_id is a column of t4 table

EDIT: I changed from innodb to myisam and i had a speed increase, especially if i put where clauses. But i h still have times (100-150 ms) The reason i want count in my application, is to the the user who is processing a search form, the number of results he is expecting with ajax. May be there is a better solution in this, for example creating a temporary table, that is updated every one hour?

Please, use MySQL command `EXPLAIN` for your query and let us know the results. — J. Bruni, Sep 16 '12 at 12:05
For the sake of performance some compromises are allowed - like denormalisation. What you can do for example is keeping the number of detail rows within each master row. So if your tables build a 4 level tree, each node keeps number of its children on every level. I makes updates harder (you can use some stored procedure), but counting works with querying one table. — WojtusJ, Sep 17 '12 at 07:23
It would be much easier if you placed a sample of you database on [sqlfiddle](http://sqlfiddle.com). — WojtusJ, Sep 17 '12 at 16:12

WojtusJ · Answer 1 · 2012-09-16T13:06:42.427

The count query is simply faster because of INDEX ONLY SCAN, as stated within query plan. The query you mention consists of only indexed columns, and thats why during execution there is no need to touch physical data - all query is performed on indexes. When you put some additional clause consisting of columns that are not indexed, or indexed in a way that prevents index usage there is a need to access data stored in a heap table by physical address - which is very slow.

EDIT: Another important thing is that, those are PKs, so they are UNIQUE. Optimizer choses to perform INDEX RANGE SCAN on the first index, and only checks if keys exist in subsequent indexes (that's why the plan states there will be only one row returned).

EDIT2: Thx to J. Bruni, in fact that is clustered index co the above isn't the "whole truth". There is probably full scan on the first table, and three subsequent INDEX ACCESSes to confirm the FK existance.

i added a sample of my real database! – user666 Sep 17 '12 at 09:06 — user666, Sep 17 '12 at 09:06

score 0 · Answer 2 · answered Sep 16 '12 at 12:22

0

count iterate over whole result set and does not depends on indexes. Use EXPLAIN ANALYSE for your query to check how it is executed.

select + limit does not iterate whole result set, hence it's faster

answered Sep 16 '12 at 12:22

Anton

5,831
3
35
45

Not true, while counting rows DBMS can perform INDEX ONLY scan and not even touch the HEAP TABLE - in particular cases of course. – WojtusJ Sep 16 '12 at 12:31
I don't clear understand how it is possible to perform index only scan to count rows in result set which is obtained by joining tables – Anton Sep 16 '12 at 12:36
Well, this is tricky in this particulat case. :-) Note that there are only PKs involved, so we can be sure that they are UNIQUE. All we have to do is an INDEX RANGE SCAN on first table (T4, the smallest one) and check if the key exists in subsequent indexes (just INDEX ACCESS, not even RANGE SCAN because of UNIQUENESS). That's why the plan states that only one row is returned in subsequent operations. This the power of proper indexing and well done optimizer. – WojtusJ Sep 16 '12 at 12:56

score -1 · Accepted Answer · edited May 23 '17 at 11:49

-1

Regarding the COUNT(*) slow performance: are you using InnoDB engine? See:

The main information seems to be: "InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages."

So, one possible solution is to create a separated index and force its usage through USE INDEX command in the SQL query. Look at this comment for a sample usage report:

http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/comment-page-1/#comment-529049

Regarding the WHERE issue, the query will perform better if you put the condition in the JOIN clause, like this:

SELECT COUNT(t1.t1_id) AS count FROM (t1)
JOIN t2 ON (t1.column10 = 1) AND (t1.t2_id = t2.t2_id)
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id

edited May 23 '17 at 11:49

Community

1
1

answered Sep 16 '12 at 12:22

J. Bruni

20,322
12
75
92

1

Not true, there is a difference only for OUTER JOIN-s, for INNER ones theere is no difference. Optimizer will probably choose to first filter T1 table anyway. – WojtusJ Sep 16 '12 at 12:33
i have used COUNT(t1.t1_id) and it has no difference. Putting where clauses in joins speeds up things a little bit, but the main problem remains.. – user666 Sep 16 '12 at 12:34
Interesting resource found while researching. Worths taking a look: http://www.lullabot.com/articles/slow-queries-check-cardinality-your-mysql-indexes – J. Bruni Sep 16 '12 at 13:07
Actually i changed to myisam, and optimized my queries to do only the necessary joins, each time. Now i have an average time of 100ms per count query, which still seems slow to me, for the table size. See also my edits in the post :) – user666 Sep 17 '12 at 01:32
i added a sample of my real database! – user666 Sep 17 '12 at 09:05

sql count results query with joins perfomance

3 Answers3