0

I've got simple query with big IN clause:

SELECT test_id FROM sample WHERE id IN (99 000 of ids);

The explain gives me this result:

id  select_type  table    type   possible_keys  key         key_len   ref   rows    Extra   
1   SIMPLE       sample   range  PRIMARY        PRIMARY     4         NULL  40      Using where

The id field is primary key of sample table (~320 000 rows) and test_id is foreign key to test table - both are mysql InnoDB tables. Query takes over 2000 secs! I tried to join tables but it took a similar time. After some research i found this topic but the correct answer was only saying what the problem may be (which i don't understand, to be honest :/ ) and there is no solution other than

If these are in cache, the query should run fast

How can i speed up this query? Please be as precise as possible, cause as I found out I'm a optimization novice.

EDIT 1: SHOW CREATE TABLE sample

CREATE TABLE `sample` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `test_id` int(11) NOT NULL,
 ...
 PRIMARY KEY (`id`),
 KEY `sample_FI_1` (`test_id`),
 ... other keys ...,
 CONSTRAINT `sample_FK_1` FOREIGN KEY (`test_id`) REFERENCES `test` (`id`),
 ... other foreign keys ...
) ENGINE=InnoDB AUTO_INCREMENT=315607 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci

There was simple join something like this:

SELECT t.* FROM test t JOIN sample s ON t.id = s.test_id JOIN sample_x x ON s.id = x.sample_id WHERE x.field_id = '321' AND x.value LIKE '%smth%';

innodb_buffer_pool_size:

SELECT @ @innodb_buffer_pool_size /1024 /1024 /1024

@@innodb_buffer_pool_size/1024/1024/1024    
 24.000000000000

Statuses:

SHOW TABLE STATUS FROM zert1442

Name        Engine   Version       Row_format     Rows       Avg_row_length         Data_length         Max_data_length         Index_length        Data_free       Auto_increment      Create_time             Update_time     Check_time  Collation         Checksum  Create_options  Comment             
...
sample    InnoDB     10            Compact        357323       592                  211632128         0                       54837248          7340032           315647                2017-02-15 10:22:03     NULL            NULL          utf8_general_ci   NULL        
test        InnoDB   10            Compact        174915       519                  90865664            0                     33947648          4194304           147167                2017-02-15 10:22:03     NULL            NULL          utf8_general_ci   NULL        
...
Agrest
  • 234
  • 2
  • 15
  • Provide `SHOW CREATE TABLE sample` output in your question. – Willem Renzema Jul 01 '20 at 20:49
  • 1
    Also, provide the `JOIN` version of your query, along with the `SHOW CREATE TABLE` for any additional tables involved. Likely this can be fixed with an appropriate index. – Willem Renzema Jul 01 '20 at 21:24
  • 90K ids will be slow. But not 2000 seconds. Please provide what Willem asked for, plus RAM size and `innodb_buffer_pool_size`. Also `SHOW TABLE STATUS` so we get a feel for the size of the table. – Rick James Jul 02 '20 at 03:59
  • ok, I added the information you asked – Agrest Jul 02 '20 at 16:24

2 Answers2

1

Here is your query:

SELECT t.* 
FROM test t 
JOIN sample s ON t.id = s.test_id 
JOIN sample_x x ON s.id = x.sample_id 
WHERE x.field_id = '321' 
AND x.value LIKE '%smth%'

Unfortunately you didn't provide the SHOW CREATE TABLE output for the test or sample_x tables.

Regardless, add this index if it doesn't already exist:

ALTER TABLE sample_x
ADD INDEX `wr1` (`sample_id`,`field_id`,`value`)

This should improve things a bit. However, using LIKE with a wildcard at the start of the string cannot be improved (at least not without fulltext indexes.)

You can also try this index, too:

ALTER TABLE sample_x
ADD INDEX `wr2` (`field_id`,`value`,`sample_id`)

This will allow the optimizer to start at the sample_x table and then work backwards towards the test table. Which it prefers to do will depend on a lot of factors.

You can remove either of these indexes with the following:

ALTER TABLE sample_x
DROP INDEX `wr1`

Or

ALTER TABLE sample_x
DROP INDEX `wr2`

Experiment to see which helps your query the most, if either of them do. When measuring performance always run the query twice, and throw out the first result. That is because the buffer cache needs to be populated the first time, and so it can take longer and won't be an accurate measure of the real improvement.

Willem Renzema
  • 5,177
  • 1
  • 17
  • 24
  • As I write before, for now i don't have this join query. I've split it to subquerys to see where the problem is. While time needed to find correct sample_ids from sample_x takes about 4 s (this is absolutely OK), searching of test_id from sample by its PK takes over 2000 s - do You think these indexes will reduce it at least by half? – Agrest Jul 02 '20 at 22:01
  • @Agrest You find that out by adding the index and seeing for yourself. It should only take you a couple minutes to see what effect they have. Query optimization is sometimes trial and error. Add the indexes, run the query, report results; we'll iterate from there to see what other improvements can be made. – Willem Renzema Jul 03 '20 at 02:33
0

First things first: what do you consider 'slow'? You're getting 99k records out of a table, that is bound to take some time!

In my experience, if your IN() list contains 99.000 values I'd REALLY consider putting these into a (temporary) table first and adding a unique index on said table (or PK if you prefer) and then JOIN that table against your sample table. That said, I'm not sure what the fastest way would be to get 99k id's into that (temporary) table; in .Net/MSSQL I'd use the SqlBulkCopy object, I'm not sure what you are using there.

PS: You can off course simply stick to rather verbose INSERT statements but I fear that inserting 99k values like that will take even more time than what you're seeing now. Where are these values coming from in the first place? The more I think about it, the more I'm guessing your underlying approach might be 'off'.

deroby
  • 5,902
  • 2
  • 19
  • 33