4

It's well-documented how to retrieve one record quickly or how to retrieve multiple records inefficiently (by plucking all IDs). I'm wondering what is the fastest way to retrieve N records from a table of millions of records.

I've found with a 3M row MariaDB table, plucking all IDs takes 10+ seconds and ordering by rand() takes over a minute. This makes me think N separate random offset calls (after finding the total table count) could be quicker, assuming N is relatively low. I'm still wondering if there's a faster way or if not, some trick to make the random offset calls in a single query.

Community
  • 1
  • 1
mahemoff
  • 44,526
  • 36
  • 160
  • 222
  • if you are just pulling random records would it make sense to add a where clause so all 'random' records are pulled from a smaller time range so the rand ordering would be against less records? say records created in the last 2 weeks – house9 Jun 07 '14 at 15:40
  • @house9 In some cases, it could. I'm thinking of the more generic case of sampling anything. – mahemoff Jun 07 '14 at 18:06
  • It would be faster to do 30 queries, with a random offset and a limit of 1, than to select every Id from 3 million rows just to shuffle them and take the first 30. – user229044 Jun 08 '14 at 14:51
  • I'd think so, but 30 queries doesn't seem too efficient either, so I wondered if there's an even faster way. – mahemoff Jun 16 '14 at 09:58

1 Answers1

1

I happened to be reading and thinking about this recently! The best solution I came across was the one given at http://explainextended.com/2009/03/01/selecting-random-rows/: create a random function that can be applied iteratively, row by row, along the lines of this query from the page (for n=10):

SELECT  rnd_id, rnd_value
FROM    (
        SELECT  @cnt := COUNT(*) + 1,
                @lim := 10
        FROM    t_random_innodb
        ) vars
STRAIGHT_JOIN
        (
        SELECT  r.*,
                @lim := @lim - 1
        FROM    t_random_innodb r
        WHERE   (@cnt := @cnt - 1)
                AND RAND() < @lim / @cnt
        ) i

Note that you can't use this approach if you have additional where conditions--indeed I don't think there's a particularly efficient way of solving it in this case, while reliably getting the required number of results.

histocrat
  • 2,291
  • 12
  • 21