5

I have two tables : A - 301 columns ( 1st one named a1 int(11) Primary Key, 2nd to 301th - double(15,11) ) & B - 33 columns ( 1st one - b1 int(11) Unique Key, 2nd One - b2 varchar(100) Primary Key, ... , 33rd - b33 int(11) MUL ).

Both A & B have ~ 13,500,000 records.

My mysql query : For every value of pos, with pos in set (1, 1000, 2000, ..., 13500000) in multiples of 1000 :

select A.*, b2, b5, b7, b8, b10, b13, b33 from A join B on a1=b1 where b33 >= pos and b33 < pos+1000;

The query takes 1-5 seconds for values of b33 <= 600,000. After that the query starts taking 20-30 seconds. When b33 >= 8,000,000 the query starts taking 60-70s. I can't understand why the slowdown is happening. b33 is indexed and the join takes place on the key that is defined as primary in one table and unique in the other. Is there a workaround for this? This is really hampering the speed of the code and I will have to split the tables A & B into several smaller ones if nothing else works. I really hope I don't have to do that! Please help!

EDIT: Here is the o/p of EXPLAIN -

************* 1. row *************
id: 1
select_type: SIMPLE
table: B
type: range
possible_keys: b1,b33
key: b33
key_len: 4
ref: NULL
rows: 981
Extra: Using where
************* 2. row *************
id: 1
select_type: SIMPLE
table: A
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: DBName.B.b1
rows: 1
Extra:
2 rows in set (0.00 sec)

ayesha129
  • 51
  • 2

7 Answers7

1

Since your database has several million records, are you doing anything to keep your DB in good health?

Running the following command nightly might help with general responsiveness if your data changes frequently (lots of inserts, maybe?):

mysqlcheck --check --analyze --auto-repair --all-databases --silent

Though I would suggest reading up a bit on mysqlcheck before running the command, just so you know what it's doing.

You should also look at optimizing your InnoDB configuration, especially innodb_buffer_pool_size (the more memory you can give it, the better). I was experiencing a similar slowness on a date-based field (which, of course, we immediately indexed) in a similarly sized table, and increasing the buffer pool size from the default 8 megabytes to several gigabytes made a very noticeable difference.

If you're deleting many rows from any table involved in the join, you might consider running OPTIMIZE TABLE as well.

Dave
  • 797
  • 6
  • 10
0

I'm no MySQL(or anything!) guru but some things I would consider. First, is b33 evenly distributed? May be it's slower because is effectively retrieving more rows? Second, have you considered to do all the work in a single query instead of 13500? Something like:

select A.*, b2, b5, b7, b8, b10, b13, b33, (b33 - 1 DIV 1000) the_group
from A join B on a1=b1 

Third, a wild guess, if your version of MySQL supports it, use an inlinew view to do the filtering first:

select A.*, b2, b5, b7, b8, b10, b13, b33 
from A join (select b1,b2, b5, b7, b8, b10, b13, b33 
             from B b33 >= pos and b33 < pos+1000) B_NEW 
     on a1=b1 ;

Fourth(should be first), do a explain plan and try to learn why the query is slow comparing the fast querys with the slow ones.

Good luck!!

Samuel
  • 2,430
  • 2
  • 19
  • 21
  • Yes. b33 is evenly distributed. Here is the output of explain: – ayesha129 Jun 07 '11 at 09:06
  • `*************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: range possible_keys: i1,id key: id key_len: 4 ref: NULL rows: 981 Extra: Using where *************************** 2. row *************************** id: 1 select_type: SIMPLE table: a type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: ja_StepUp.b.resid rows: 1 Extra: 2 rows in set (0.00 sec) ` – ayesha129 Jun 07 '11 at 09:10
0

Just a shot in the dark...

select A.*, b2, b5, b7, b8, b10, b13, b33 
  from A join B 
  on a1=b1 
  where b33 BETWEEN pos AND pos+999;
ic3b3rg
  • 14,629
  • 4
  • 30
  • 53
0

Can you show us the indices that you have setup on B? (interested in how the index on b33 is defined, and whether it is defined on a single column, or on multiple columns):

SHOW INDEXES FROM B;

Do you see the same speed decrease when you only select from B?

ie

select b2, b5, b7, b8, b10, b13, b33 from B where b33 >= pos and b33 < pos+1000;

Can you show us the part from the SHOW CREATE TABLE that involves field b33 (interested in NULL allowed)

Are you using MyISAM or InnoDB as database engine? (You can see this in the result of SHOW CREATE TABLE).

Eljakim
  • 6,877
  • 2
  • 16
  • 16
0

Explain plan and indexes seem fine.

I suggest you to compare the profiles and see where time really goes:

SET profiling=1;

select A.*, b2, b5, b7, b8, b10, b13, b33 from A join B on a1=b1 where b33 >= 0 and b33 < 1000;
SHOW PROFILE;

select A.*, b2, b5, b7, b8, b10, b13, b33 from A join B on a1=b1 where b33 >= 1000000 and b33 < 1001000;
SHOW PROFILE;


SET profiling=0;

but i think it maybe slow because of index post 600k does not fit into memory anymore and more disk seeks are made

Imre L
  • 6,159
  • 24
  • 32
0

YOU NEED TO REFACTOR THIS QUERY !!!

Here is your old query :

select A.*, b2, b5, b7, b8, b10, b13, b33
from A join B on a1=b1 where b33 >= pos and b33 < pos+1000;

Here is the new one :

SELECT
    AAA.*,b2,b5,b7,b8,b10,b13,b33
FROM
    A AAA INNER JOIN
    (
        select
            A.a1,b2,b5,b7,b8,b10,b13,b33
        from
            A INNER JOIN
            (
               SELECT
                   b1,b2,b5,b7,b8,b10,b13,b33
               FROM B
               WHERE
                    b33 >= pos and
                    b33 < pos+1000
            ) BB
            ON A.a1=B.b1
    ) BBB
    USING (a1)
;

CAVEAT

The goal of this refactored query is to make the temp tables within the query plan as small as possible. In fact, subquery BBB should never have more than 1000 rows at any given time.

Give it a Try !!!

RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
0

ayesha129p,

try moving the b33 constraints into the join clause. It sounds like the optimizer is only applying one of the b33 constraints pre-join-set-creation.

select A.*, b2, b5, b7, b8, b10, b13, b33 from A join B
  on a1=b1 and b33 >= pos and b33 < pos+1000;

This way the optimizer should use the b33 index and reduce the B row set to 1000 before attempting the join.

Wayne Walker
  • 2,316
  • 3
  • 23
  • 25