6

I have a query

SELECT DISTINCT phoneNum 
FROM `Transaction_Register` 
WHERE phoneNum NOT IN (SELECT phoneNum FROM `Subscription`) 
LIMIT 0 , 1000000

It takes too much time to execute b/c Transaction_Register table has millions of records is there any alternative of above query I will be grateful to you guys if there is any.

sarwar026
  • 3,821
  • 3
  • 26
  • 37
developerpk
  • 61
  • 1
  • 1
  • 4
  • 1
    do you want something faster, do you have some better alternative in mind? What are the schemas of the tables, how is the data distributed (statistics), do you have an execution plan `EXPLAIN`? What indecies do you have? – Jodrell Jun 03 '13 at 16:49
  • replace your MySQL binaries with MariaDb instead. Its much better than MySQL for this kind of query. The reason being is that it has a much better query planner. – Aron Jun 03 '13 at 16:52
  • 1
    @Aron SQL has been around for a while, there are many engines to choose with different features and costs. If you put a turd on a shovel or a golden platter, it still smells bad. – Jodrell Jun 03 '13 at 16:55
  • Try adding an index to your table. – Gimmy Jun 03 '13 at 16:55
  • 1
    @Jodrell What exactly were you commenting on? The costs of each SQL statement is different for each DB Engine. MySQL is known for being unable to efficiently use subqueries. – Aron Jun 03 '13 at 16:58
  • @Aron but the question is about MySql. If you wanted to change engines there are many options. – Jodrell Jun 03 '13 at 17:00
  • possible duplicate of [MySQL "NOT IN" query](http://stackoverflow.com/questions/1519272/mysql-not-in-query) – Jodrell Jun 03 '13 at 17:11
  • Do you have any indexes on these tables? – Taryn Jun 03 '13 at 18:06
  • @Jodrell not really. The point OF MariaDB is that it IS MySQL. It is a fork by the original developers of MySQL from the Oracle branch. Thus it IS an answer to a MySQL question. Fact is no one of note actually uses Oracle MySQL. FB uses their own fork, Twitter maintains their own fork, and both of those feed into the open source MariaDB, which powers Wikipedia. The official recommendation of FB is to use MariaDB. – Aron Jun 04 '13 at 02:24

2 Answers2

17

An alternative would be to use a LEFT JOIN:

select distinct t.phoneNum
from Transaction_Register t
left join Subscription s
  on t.phoneNum = s.phoneNum
where s.phoneNum is null
LIMIT 0 , 1000000;

See SQL Fiddle with Demo

Taryn
  • 242,637
  • 56
  • 362
  • 405
5

I doubt whether LEFT JOIN truly perform better than NOT IN. I just perform a few tests with the following table structure (if I am wrong please correct me):

account (id, ....)   [42,884 rows, index by id]
play (account_id, playdate, ...)   [61,737 rows, index by account_id]

(1) Query with LEFT JOIN

SELECT * FROM
account LEFT JOIN play ON account.id = play.account_id
WHERE play.account_id IS NULL

(2) Query with NOT IN

SELECT * FROM
account WHERE
account.id NOT IN (SELECT play.account_id FROM play)

Speed test with LIMIT 0,...

LIMIT 0,->   100      150      200      250
-------------------------------------------------------------------------
LEFT         3.213s   4.477s   5.881s   7.472s
NOT EXIST    2.200s   3.261s   4.320s   5.647s
--------------------------------------------------------------------------
Difference   1.013s   1.216s   1.560s   1.825s

As I increase the the limit, the difference is getting larger and larger


With EXPLAIN

(1) Query with LEFT JOIN

SELECT_TYPE   TABLE      TYPE   ROWS    EXTRA
-------------------------------------------------
SIMPLE         account   ALL    42,884
SIMPLE         play      ALL    61,737  Using where; not exists

(2) Query with NOT IN

SELECT_TYPE          TABLE      TYPE   ROWS   EXTRA
-------------------------------------------------
SIMPLE               account   ALL    42,884  Using where
DEPENDENT SUBQUERY   play      INDEX  61,737  Using where; Using index

It seem like the LEFT JOIN does not make use of index

LOGIC

(1) Query with LEFT JOIN

After LEFT JOIN between account and play will produce 42,884 * 61,737 = 2,647,529,508 rows. Then check if play.account_id is NULL on those rows.

(2) Query with NOT IN

Binary search takes log2(N) for item existence. That's mean 42,884 * log2(61,737) = 686,144 steps

invisal
  • 11,075
  • 4
  • 33
  • 54