1

I am migrating millions of rows from table1 to table2. After the migration, I am trying to spot check certain rows to see if they are migrated properly. Here is a sample SQL query [not the exact one]:

SELECT     tbl1.name,
           tbl1.address,
           tbl2.name,
           tbl2.new_address
FROM       
          (SELECT * 
           FROM table1
           ORDER BY   RAND() limit 10) tbl1
INNER JOIN table2 tbl2
ON         tbl1.name = tbl2.name;

I am trying to select 10 rows from table 1 to join with table2 for checking. If I ran this query with an explain, it does a full table scan i.e., all rows in table 1 (which is > 130 million). How to optimize this query in MySQL?

ssk
  • 9,045
  • 26
  • 96
  • 169
  • Many good approaches here https://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast Off hand, I'd favor the generating random offsets one, if that'd work for your data, less strenuous SQL.. – George M Reinstate Monica Jul 27 '18 at 21:57

1 Answers1

1

Based on the suggestions from http://bigdatums.net/2017/01/05/select-random-sample-records-mysql/

SELECT     tbl1.name,
           tbl1.address,
           tbl2.name,
           tbl2.new_address
FROM       
          (SELECT * 
           FROM table1
           WHERE RAND() < 0.2 LIMIT 10) tbl1
INNER JOIN table2 tbl2
ON         tbl1.name = tbl2.name;

seems to be sufficient for my case i.e., RAND() < 0.2 LIMIT 10.

ssk
  • 9,045
  • 26
  • 96
  • 169