0

Im trying to make efficient SQL code for MySQL to get some values, but in random order and in different amount. The problem is that tables are quite big (~4m of rows, ~400 MB) and I dont have much time to do it (for now it takes about ~1-2 minutes for each try). Also, there is index for each of columns, but not UNIQUE and it is string value, not auto-inc val.

Im generating long SQL query:

(SELECT fieldA,'id1' AS id FROM myTable WHERE (fieldB LIKE 'xxxx:%') ORDER BY RAND() LIMIT 7)
UNION ALL
(SELECT fieldA,'id2' AS id FROM myTable WHERE (fieldB ='123123') ORDER BY RAND() LIMIT 5)
etc...

I would like to order this table only once (this takes so much time). I've already tried:

and I had only luck with the last one (III. suggestion from OP), but "magic" number of 16 doesnt do a trick - this is good for smaller tables and not for tables with ~4000000 rows.

This is ouput of sample EXPLAIN:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     myTable     range   fieldB      fieldB      143     NULL    64198   Using where; Using temporary; Using filesort
2   UNION   myTable     ALL     NULL    NULL    NULL    NULL    4386943     Using where; Using temporary; Using filesort
3   UNION   myTable     range   fieldB      fieldB      143     NULL    34374   Using where; Using temporary; Using filesort
4   UNION   myTable     ref     fieldB      fieldB      143     const   1999    Using where; Using temporary; Using filesort
5   UNION   myTable     range   fieldB      fieldB      143     NULL    1   Using where; Using temporary; Using filesort    NULL
UNION RESULT    <union1,2,3,4,5>    ALL     NULL    NULL    NULL    NULL    NULL    

So my guess is that ORDER BY RAND is the main problem - it makes "Using temporary; Using filesort" for every UNION parts.

Table definition:

CREATE TABLE IF NOT EXISTS `myTable` (
  `fieldA` varchar(42) NOT NULL,
  `XYZ` varchar(36) NOT NULL,
  `fieldB` varchar(47) NOT NULL,
  KEY `fieldA` (`fieldA`),
  KEY `XYZ` (`XYZ`),
  KEY `fieldB` (`fieldB`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

It stores just simple, short strings, but a lot of them.

Any advices, meaby there is different approach?

@edit, right now Im using both MySQL and PHP to achive it:

  1. Im getting list of reqired values of fieldB, by making UNIONs for

    SELECT fieldB, "xxxx:%" AS orygLike FROM myTable WHERE fieldB LIKE "xxxx:%" GROUP BY fieldB
    

    etc for each UNIONed query - only for those in LIKE mode, if this is '=' I already know which fieldB is valid :)

  2. Then Im able to make mapping array of fieldBVal => orygLIKE (for example "xxxx:yyyy"=>"xxxx:%")

  3. I list ALL the IDs of fieldA's which can be used, by ID using WHERE id IN (id1,id2,id3...) - this way I have all IDs which may be used. Right here Im combining arrays together and select random IDs with array_rand .

  4. Simple:

    SELECT * FROM myTable WHERE id IN (RndID1, RndID2, RndID3 etc...)
    

Its very fast and gives good results :)

Thanks to fancyPants for pointing about ID auto-inc field

Community
  • 1
  • 1
Jakub Krol
  • 350
  • 5
  • 16

1 Answers1

2

You have query in there that's scanning all rows of the table.

See this row from your explain

2   UNION   myTable     ALL     NULL    NULL    NULL    NULL    4386943     Using where; Using temporary; Using filesort

That's a huge performance killer as well. Use table aliases to see exactly which query it is and see if you can do something by adjusting indexes.

Maybe you can also rewrite your query to sort the table just once and then a compound index could be even better than having those 3 separate indexes.

Have a try with this query (but please note, that it doesn't guarantee you get 7 rows with fieldB like 'xxx:%' and 5 rows with fieldB = '123123' and so on):

SELECT 
fieldA, 
CASE WHEN fieldB LIKE 'xxxx:%' THEN 'id1'
     WHEN fieldB ='123123' THEN 'id2'
END AS id 
FROM myTable 
WHERE 
(fieldB LIKE 'xxxx:%') 
OR fieldB ='123123'
ORDER BY RAND() 
LIMIT 12 /*7 + 5*/

EDIT:

The "LIKE '%'" is of course useless, as this selects every row. It literarily says "give me whatever". If you want to make it ultrafast, here's an idea:

Add a column like this:

ALTER TABLE yourTableName ADD COLUMN id INT AUTO_INCREMENT PRIMARY KEY;

Then you get the greatest id available in your table and calculate your randoms before hand:

SET @my_max := (SELECT MAX(id) FROM yourTableName);
SET @r := RAND() * @my_max;
SELECT * FROM yourTable WHERE id >= @r LIMIT 1;

Do it again if you need more of them. I did the >= and the LIMIT 1 instead of a simple id = @r in case you delete some rows some time.

At least this part of the query is then lightning fast.

fancyPants
  • 50,732
  • 33
  • 89
  • 96
  • Well I did it - gave aliases and this is where i do "LIKE '%'". Dont really understand, it has LIMIT, so why is is done anyway? I changed it into one query without WHERE but no changes. Yes, this is TRUE it has huge impact. I wanted to try some nice subquery (for example get one ordered version of myTable, then select some rows from it for "LIKE '%'" (so any random) and then other queries to get required rows). Anyway I cannot manage to get expected results... The same is with your try - Ive already tried it but, as you pointed, it wont give me expected results... – Jakub Krol Sep 25 '13 at 12:58
  • Your edit is great. I really like it - I just need to make huge SQL queries (or query quite often), but it is very fast and I get expected results. The only thing I dont want is that randomness isnt "full", as Im not getting rows from last 80% of table. Also tried MediaWiki`s trick with "random" column but no success. – Jakub Krol Sep 25 '13 at 19:31
  • +1 Yes the best solution is to generate a random id and pick the first row >= that random value. Assuming the id's are evenly distributed, this works well. But FWIW, `UNION` still creates temp tables, but they will be small with just 5-7 rows each. – Bill Karwin Sep 25 '13 at 21:25