0

I have a big table (around 70milion records) and I want to get some result in a order from database.

Here is my query

SELECT * 
FROM `comment` 
WHERE `account_id` IN ('accountId1','accountId2') 
ORDER BY `date` desc, `id` desc 
LIMIT 20; 

I'm trying to get comments for specific account and I need to order them by date and if they have same date I need to order them by id desc.

I added index to improve the query like this:

ALTER TABLE `comment` 
ADD INDEX `account_id` (`account_id`, `date`, `id`); 

My problem is it used filesort for sorting result and it's not using index.

Here is the explain query


+----+-------------+--------------------+------------+-------+---------------+------------+---------+------+-------+----------+---------------------------------------+
| id | select_type | table              | partitions | type  | possible_keys | key        | key_len | ref  | rows  | filtered | Extra                                 |
+----+-------------+--------------------+------------+-------+---------------+------------+---------+------+-------+----------+---------------------------------------+
|  1 | SIMPLE      | np_account_comment | NULL       | range | account_id    | account_id | 5       | NULL | 20757 |   100.00 | Using index condition; Using filesort |
+----+-------------+--------------------+------------+-------+---------------+------------+---------+------+-------+----------+---------------------------------------+

In the explain it said key_len is 4 so its only use first part of index (account_id) as index and its not using date and id for sorting.

Here is what I understood

  1. Its not depend on order by desc, I tested on order by asc and it has same result.
  2. The problem probably is because of account_id IN ('accountId1','accountId2') because when I using ``account_id = 'accountId1' I have these result

+----+-------------+--------------------+------------+------+----------------------+------------+---------+-------------+------+----------+---------------------+
| id | select_type | table              | partitions | type | possible_keys        | key        | key_len | ref         | rows | filtered | Extra               |
+----+-------------+--------------------+------------+------+----------------------+------------+---------+-------------+------+----------+---------------------+
|  1 | SIMPLE      | comment            | NULL       | ref  | account_id           | account_id | 4       | const,const |    1 |   100.00 | Backward index scan |
+----+-------------+--------------------+------------+------+----------------------+------------+---------+-------------+------+----------+---------------------+

Still key_len is 5 but in extra it doesnt said filesort

How I can improve my query speed? sometime it take 20sec to complete.

If I'm current when mysql used full index the key_len should be 13 ( 4(INT) * 2 + 5 (DATETIME) )

Ali Akbar Azizi
  • 3,272
  • 3
  • 25
  • 44
  • How many rows are returned for each account_id? (MySQL assumes it is about "5", is that in the right ballpark?) – Solarflare Sep 02 '23 at 13:39
  • @Solarflare no its not, i use explain on sample data, the real result depend on account, it can be 0 or it can be 100k or more. – Ali Akbar Azizi Sep 02 '23 at 13:49
  • In your question you say you are trying to get records for a specific account and user, but there is no reference to user in your SQL? – NickW Sep 02 '23 at 13:51
  • @NickW I changed my question, account is belong to user, so its enough to filter comments base on account. – Ali Akbar Azizi Sep 02 '23 at 13:52
  • Please update your question with the EXPLAIN PLAN for the actual query, not for one that uses sample data – NickW Sep 02 '23 at 13:56
  • @NickW there is no actual query, the account_id is depend on which user trying to run the query, I updated my answer for my accounts but this is not always true it could be more or less with different accounts. – Ali Akbar Azizi Sep 02 '23 at 14:05
  • Well, let's assume we find hundred rows, fifty for accountId1, fifty for accountId2. The DBMS *could* use the date and id found in the index, but this is two data sets that it would have merge / sort to get the final order. It may be easier to just look at the rows and sort them. I suppose that the optimizer "thinks" along these lines. – Thorsten Kettner Sep 02 '23 at 14:14
  • @ThorstenKettner I used `force index for order by` but not working, I can't change my query this is the data I need to show to users, how I supposed to improve this? – Ali Akbar Azizi Sep 02 '23 at 14:21
  • DB engine doesn't have any obligation to use the indexes if he decides not to use . what if you replace 'IN' with an 'OR' ? where account_id = 'accountId1' OR account_id = 'accountId2' Does it make any difference? – Ozan Sen Sep 02 '23 at 14:24
  • @OzanSen no the explain result is same and the query take +10 sec to complete – Ali Akbar Azizi Sep 02 '23 at 14:28

1 Answers1

2

MySQL cannot use the index to get rid of the ordering, because after e.g. a result row accountId1, Jan 10th, the next result can be either accountId1, Jan 9th or accountId2, Jan 10th. So MySQL has to first check "all other accounts" (which requires sorting) before knowing what comes next.

There are (at least) 2 things that you can try:

First, you can reduce the rows read per account_id by reading them separately:

select * from
(
  (
   select * from `comment` 
   where `account_id` = 'accountId1' 
   order by `date` desc, `id` desc 
   limit 20
  )
  union all
  (
   select * from `comment` 
   where `account_id` = 'accountId2' 
   order by `date` desc, `id` desc 
   limit 20
  )
) sub
order by `date` desc, `id` desc 
limit 20; 

This reads 20 rows per account_id, and then sorts just those n * 20 rows. So you skip reading and sorting potentially 100k rows. This logic is basically how MySQL executes your originally query - but without the internal limit (because it isn't optimized for it).

The second thing you can try is to add a covering index. Currently, MySQL uses your index to find a row, and then reads the row from the table to fulfill the select *, which is an additional step. For that, you can add an index (account_id, `date`, id, all other columns in your table). And if you do not actually need all columns in your result, adjust the select * to only select the columns you need (this reduces the size of the index).

To test the effect of this new index before actually adding it, you can e.g. change the select * to SELECT `date`, id FROM `comment` WHERE ... (both for your query and the rewritten code above). This is covered by your current index, and should give you a rough idea how much time the new index can save if it covers select *. It can happen that it is neglectable (especially if you use the query from the first suggestion).

In the execution plan, this should be reflected by showing using where instead of using index condition.

You can try both things independently from each other.

Solarflare
  • 10,721
  • 2
  • 18
  • 35
  • Thank you for your answer, Just one question, If I used covering index with primary key and then filter rows by ids because I only need first 20 rows is it good idea? for example `select * from comments where id IN ( select id from comments where account_id in (....) order by date desc, id desc limit 20 )` I just don't understand how covering index help for perfomance. – Ali Akbar Azizi Sep 02 '23 at 16:48
  • I just notice you can't use subquery with limit inside IN, but I can use JOIN and it should be same result right? – Ali Akbar Azizi Sep 02 '23 at 16:56
  • Well, your (non-working) code resembles what a covering index does: without it, in a first step, you do the index lookup (select id where...), and then, in a 2nd step, look up the other columns in the actual table (select * from ... *with the found ids*). A covering index saves that second step (the index already provides all columns). Nevertheless, the (absolute) effect is bigger if you read 100k rows than for 20 rows, so if you implement my first suggestion, you may not need it. But as I said, you can easily test the effect (also for "my" query) without actually adding the index. – Solarflare Sep 02 '23 at 17:31
  • thats perfect thank you for your answer – Ali Akbar Azizi Sep 02 '23 at 17:33