1

We have 2 tables:

CREATE TABLE `task_information` 
(
    `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    `project_id` varchar(64) NOT NULL DEFAULT '0',
    `task_id` varchar(64) NOT NULL DEFAULT '0',
    `machine_id` varchar(16) NOT NULL DEFAULT '0',
    PRIMARY KEY (`id`),
    KEY `idx_project` (`project_id`,`task_id`),
    KEY `idx_task` (`task_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1100523 DEFAULT CHARSET=utf8mb4

CREATE TABLE `task_process` 
(
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `task_id` varchar(64) NOT NULL DEFAULT '0',
  `project_id` varchar(64) NOT NULL DEFAULT '0',
  `status` varchar(16) NOT NULL DEFAULT '0'
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq_task` (`task_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1096437 DEFAULT CHARSET=utf8mb4

These two tables have a one-to-one relationship based on "task_id".

The two tables have almost 800K entries. Also there will be lots of "success" in the "status" column. The "status" column has exactly three values: "success", "fail" and "start", their ratio is roughly 100:1:1.

And now I want to select the task_id where machined_id is specified and the status equals "success", my sql query looks like this:

select * 
from task_information 
inner join task_process on task_process.task_id = task_information.task_id  
                        and task_information.machine_id = "RA00906" 
where task_process.status="success"\G;

This one takes 4.96s, the result from explain is as follows:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
    table: task_process
partitions: NULL
     type: ALL
possible_keys: uniq_task
      key: NULL
  key_len: NULL
      ref: NULL
     rows: 760145
 filtered: 10.00
    Extra: Using where
*************************** 2. row ***************************
       id: 1
select_type: SIMPLE
    table: task_information
partitions: NULL
     type: ref
possible_keys: idx_task
      key: idx_task
  key_len: 258
      ref: task_process.task_id
     rows: 1
 filtered: 10.00
    Extra: Using where

But if I change "=" to "like" the speed becomes much faster (around 0.9 seconds):

select status 
from task_information 
inner join task_process on task_process.task_id = task_information.task_id 
                        and task_information.machine_id = "RA00906" 
where task_process.status like "success"

The result from explain is like this:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: task_information
   partitions: NULL
         type: ALL
possible_keys: idx_task
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 759749
     filtered: 10.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: task_process
   partitions: NULL
         type: eq_ref
possible_keys: uniq_task
          key: uniq_task
      key_len: 258
          ref: task_information.task_id
         rows: 1
     filtered: 11.11
        Extra: Using where

Both of these queries return around 500 rows, but "LIKE" is much faster than "=" even though functionally they should be roughly the same.

Can someone please explain to me the reason behind this?

tl;dr using "LIKE" is much faster than "=" in the same "inner join" query, what is the reason behind this?

blueishpoop
  • 226
  • 1
  • 10
langker
  • 63
  • 5
  • check this ! https://stackoverflow.com/questions/6142235/sql-like-vs-performance – Muhammad Waheed Jan 25 '19 at 05:24
  • `where task_process.status="success"\G;` What does `\G` do? – Thilo Jan 25 '19 at 06:44
  • If you have a lot of different `machine_id`, an index on that column should help a lot. – Thilo Jan 25 '19 at 06:48
  • Is this reproducable? Or did you run the `LIKE` query immediately after the `=` query and everything just happened to be in the buffer cache? – Thilo Jan 25 '19 at 06:48
  • Also, why are these two separate tables? Could not `status` be a column on `task_information`? – Thilo Jan 25 '19 at 06:52
  • Perhaps hot/cold data? – jarlh Jan 25 '19 at 08:03
  • @jarlh hi, just wondering what is meaning of hot/cold data? key word pls? – langker Jan 25 '19 at 10:17
  • @Thilo the bottleneck is on status, when I get rid of the status, the sql runs fast. the problem is in this case, the like runs much more faster than equal, which I can understand. – langker Jan 25 '19 at 10:19
  • Hot => RAM, cold => disk. – jarlh Jan 25 '19 at 11:49
  • The number of rows has changed between the two runs. This means that you are not working in a static environment, so in all likelihood, other things running on the database/server are affecting performance. – Gordon Linoff Jan 25 '19 at 12:32
  • @GordonLinoff you are right. The row has been changed during my query, but don't know why, it is just OK now – langker Jan 29 '19 at 02:51

0 Answers0