2

We have two tables:

create table A (
  id int,
  a int,
  b int,
  c varchar(255),
  unique key K1 (a,c),
  key K2 (b,c),
  key K3 (a,b,c)
);
create table B (
  id int,
  b int,
  c varchar(255),
  x varchar(255),
  unique key K (b,c)
);

Running queries like:

SELECT B.x 
FROM B 
INNER JOIN A 
   ON A.b = B.b 
  AND A.c = B.c
WHERE A.a IN (...a values...);

a values are given from the client language (our case: Ruby), and about 10-100,000 items.

The explain is like this.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: range
          key: K
      key_len: 4
          ref: NULL
         rows: 100
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: app_devices
         type: ref
          key: K
      key_len: 4
          ref: A.b
         rows: 213
        Extra: Using index condition
2 rows in set (0.00 sec)

This query works great when the A and B are small, but it becomes slow when table size exceeds about 20 million rows. I suspect that composite key is not working well on join. How can I solve this?

mtomita
  • 33
  • 6
  • You should read MySQL index [**TIPS**](http://mysql.rjweb.org/doc.php/index_cookbook_mysql) – Juan Carlos Oropeza May 24 '16 at 17:15
  • You mean `WHERE A.a IN (...a values...);` can be 100k a values? That doesnt seem optimal for index search. Is like you join `table A` and `Avalues` but `Avalues` doesnt have index. Also you dont have composite index for A `(b,c)` so your join `ON` isnt using the proper index – Juan Carlos Oropeza May 24 '16 at 17:18
  • `(...a values...)` is actually like `(12,16,295,1053)`. A `(b,c)` exists but I omitted because optimizer did not use it, sorry. I have edited to add all indices. – mtomita May 24 '16 at 18:09
  • Then why you said `about 10-100,000 items.` Read Index tips. – Juan Carlos Oropeza May 24 '16 at 18:10

1 Answers1

1

C is an index of VARCHAR(255), which is pretty big to join on in terms of length, and also includes all characters not including numbers. If the length of the VARCHAR is infact always 255, you should set it to CHAR, see how that fairs.

However, if you have 20 million rows, that means your auto increment id will only be of length 8, and will be only comprised of digits, which suits your join much better. Put an ID to table like this;

 create table A (
  id int,
  a int,
  b int,
  c varchar(255),
  b_id int, //additional field
  unique key K1 (a,c),
  key K2 (b,c),
  key K3 (a,b,c)
);
create table B (
  id int,
  b int,
  c varchar(255),
  x varchar(255),
  unique key K (b,c)
);

Then run this once:

UPDATE a    
INNER JOIN B 
ON A.b = B.b 
AND A.c = B.c
SET a.b_id = b.id

Then your query becomes:

SELECT B.x 
FROM B 
INNER JOIN A 
ON A.b_id = b.id
WHERE A.a IN (...a values...);

That should run in a sensible time

edit:

Is your varchar(255) always 255 in length, or can it be much shorter? You could just make it shorter.

If you copy the table with all the records in without locking, and perform the add auto increment update on the copy, and in a short downtime (if possible), rename that to the original database.

You can use the following question to find out if this is the right step for you. :

Surrogate vs. natural/business keys

Community
  • 1
  • 1
Paul Stanley
  • 4,018
  • 6
  • 35
  • 56
  • Thank you, I tried it and confirmed that the performance becomes better, but our writer service is quite busy, so adding auto increment id get & update might be increase cost (because A record is created before B, update is required for A.b_id). Let me think... – mtomita May 25 '16 at 04:56