4

I have 2 tables as follow (that was a phpMyAdmin dump which is why it have the ALTER TABLE):

CREATE TABLE IF NOT EXISTS `definition` (
`id` int(10) unsigned NOT NULL,
  `page_id` int(10) unsigned NOT NULL,
  `title` varchar(255) COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=2621401 ;

CREATE TABLE IF NOT EXISTS `definition_used` (
`id` int(10) unsigned NOT NULL,
  `word` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `ts_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=65 ;


ALTER TABLE `definition`
 ADD PRIMARY KEY (`id`), ADD UNIQUE KEY `page_id` (`page_id`), ADD KEY `title` (`title`);

ALTER TABLE `definition_used`
 ADD PRIMARY KEY (`id`), ADD KEY `word` (`word`,`ts_created`);

ALTER TABLE `definition`
MODIFY `id` int(10) unsigned NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=2621401;

ALTER TABLE `definition_used`
MODIFY `id` int(10) unsigned NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=65;

A SQLFiddle can be found here...

And I need to get a unique random word from it, since I have millions of records on the definition table, using RAND directly, is not an option.

I do have a query that will get a random word, which is this one:

SELECT r1.title
  FROM definition AS r1
  JOIN (SELECT (RAND() * (SELECT MAX(id)
                            FROM definition
                         )
               ) AS id
       ) AS r2
 WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1

However, that will pick words based on the id, without doing any of the checks I need to. Now let's say it picked a random id of 2 million and there was no usable words past it given r1.id >= r2.id so I get no result, but if it was less it could have had a lot of results.

Right now I have came down to this:

    SELECT a.title 
      FROM definition a 
 LEFT JOIN definition_used b 
        ON a.title = b.word 
     WHERE (b.id IS NULL OR  (b.ts_created = CURDATE())) AND 
           LOWER(a.title) LIKE @message
     LIMIT 1

From the table definition_used I need to be sure that a word was not used today, in order to be reused, so a word can have multiple entries as long as the ts_created does not collide with the same date hence why I check for:

(b.id IS NULL OR  (b.ts_created = CURDATE()))

However the words that come out have 0 randomization, how can I get a random word out of the list?

I've seen some other questions where you can do that with a single table using the max id to define a random entry but I have not reference from definition table to the definition_used table other than the word itself.

  • To put it simple, I need to be able to pick a random word from the available non-used words which is what I don't know how to go about.
Prix
  • 19,417
  • 15
  • 73
  • 132

1 Answers1

0

Still looking for a better query/answer but, this is what I came down to which works, however takes about 2 seconds to get a word which I think can be further optimized so if anyone feel like giving it a shot and optimizing or posting a better query for this I will gladly accept it as the right answer.

  SELECT r1.title
    FROM definition AS r1
    JOIN (SELECT (RAND() * (SELECT MAX(a.id)
                              FROM definition a 
                         LEFT JOIN definition_used b 
                                ON a.title = b.word 
                             WHERE (b.id IS NULL OR
                                    (b.ts_created = CURDATE())
                                   ) AND 
                                   LOWER(a.title) LIKE @word
                           )
                 ) AS id
         ) AS r2
   WHERE r1.id >= r2.id
ORDER BY r1.id ASC
   LIMIT 1

This is the EXPLAIN of it in case anyone wanted to see:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  system  NULL    NULL    NULL    NULL    1   
1   PRIMARY     r1  range   PRIMARY     PRIMARY     4   NULL    1293640     Using where
2   DERIVED     NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
3   SUBQUERY    a   index   NULL    title   767     NULL    2587281     Using where; Using index
3   SUBQUERY    b   ref     word    word    767     sebot.a.title   1   Using where; Using index
Prix
  • 19,417
  • 15
  • 73
  • 132