0

Thanks in advance for taking the time to read my question. I'm using MySQL and Python together and I'm executing first this SQL command

  SELECT T1.col1 AS t, T1.col4 AS d, T2.col3 AS e, T2.col4 AS d, T2.col7 AS p,
  T2.col13 AS de, 
  T2.col5 AS carrier, T2.col12 AS f, T2.col10 AS c
  FROM table AS T1
  JOIN table_c AS T2 ON T1.col1 = T2.col1
  ORDER BY RAND( )
  LIMIT 100000 ;

The databases I'm working on are quite big (T1 is of 300 000 000 lines and T2 is of 300 000 lines) and I created them thanks to phpMyAdmin . I created indexes also on the col1 of T1 and col1 of T2. But when I run the SQL command it gets stuck on the copying to tmp table phase message ( I left it for the whole weekend and it's still at the same point)

I tried modifying the parameter tmp_table_size but it didn't change anything

I have no idea what could be the solution for me to execute this command in a reasonable time

All suggestions are welcome

Thank you so much for your help

Rim
  • 13
  • 1
  • 3
  • Please remove the ORDER BY and LIMIT clause and let us know how many rows it returns, and in what time frame. Also, please explain what you're trying to do. Why return 100,000 random rows? – Marcus Adams Jun 25 '12 at 12:50
  • hello thank you for answering, I did take them out and the query ran smoothly since I needed it just as an addition to get the most significant set of data, I think I will take the 'ORDER BY RAND()' out to speed up the query – Rim Jun 25 '12 at 13:49

2 Answers2

1

The ORDER BY RAND() trick should never be used with such large table sizes. ORDER BY RAND() works by assigning a random value for each row in your JOINed result set (in your case probably hundreds of millions of rows), actually creating all those rows, sorting them on disk, and if it's ready, only then can it choose and return the top 100000 rows for you.

Solutions are not obvious, but you can find some techniques to fight with your problem, like How can i optimize MySQL's ORDER BY RAND() function?

Community
  • 1
  • 1
biziclop
  • 14,466
  • 3
  • 49
  • 65
0
  1. Use EXPLAIN EXTENDED SELECT ... to see what the query involves (how many estimated rows to work on, which indexes are in use, etc).

  2. EXPLAIN query will also show "1 warning(s)", see SHOW WARNINGS - that will show you how exactly your query looks like after query optimizer.

  3. Post your tables structure (SHOW CREATE TABLE ...), table status for both tables, and relevant configuration (all configuration entries matching max*, *buffer*, *size*).

  4. Post your hardware configuration (cpu - speed?, cores?, ram - total? free?)

poncha
  • 7,726
  • 2
  • 34
  • 38