-1

This was an interview question. As the question says The interviewer basically asked me to get 10 random records from a database. It looks easy if you know that there are n records. But in my case. I'm not allowed to search for the number of records.

What's an efficient way to do this?

Wilmer E. Henao
  • 4,094
  • 2
  • 31
  • 39
  • 1
    `select * from TABLE LIMIT 10` ? – Nir Alfasi Oct 22 '13 at 16:36
  • 2
    It depends on how random you want it. Most RDBMSs don't guarantee ordering, so `select * from table where rownum < 11` (or `select top 10 * from table`) will be random-ish. – Joe Oct 22 '13 at 16:37
  • 5
    What RDBMS? What table structure? Why would knowing the number of records in the table help? – Martin Smith Oct 22 '13 at 16:37
  • 2
    To continue @MartinSmith's question - what should you return if there simply aren't 10 records in the table? Or are you allowed to assume that there are? – Mureinik Oct 22 '13 at 16:38
  • Is there a specific RDBMS you should be targetting? You can `ORDER BY` something like `NEWID()`. Obviously not cryptographically secure, but good enough for casual use. – zimdanen Oct 22 '13 at 16:38
  • The most efficient way to do this would be to google, "your rdbms random records" and see what you get. – Dan Bracuk Oct 22 '13 at 16:39
  • By default, RDBMs return random records (arbitrarily chosen based on optimization unless `OrderBy` specified). So, `select top 10 * from table` will return random records each time. – Ravi Trivedi Oct 22 '13 at 16:40
  • I often do a quickie technique like selecting on a substring of a field. Last character of an account number or ID (as string) is good (not first), and also some character form within a name (say the third one). Select for a value and limit to 10. Mileage may vary based on what you use, but nice quick way to get a sample. – asantaballa Oct 22 '13 at 16:41
  • Why do you need the number of records? Did he specify what he measn by "random"? Is is "random" like "arbitrary" or rather like "randomly distributed"? – Stefan Steinegger Oct 23 '13 at 11:10

2 Answers2

3
SELECT * FROM table ORDER BY RAND() LIMIT 10
AwokeKnowing
  • 7,728
  • 9
  • 36
  • 47
  • Love this approach, @AwokeKnowing. But remember the Rand() function needs to be seeded to give different result for each row. – asantaballa Oct 22 '13 at 16:56
  • well, i'd have to check on that. it will get random rows. but possibly, each time you run the query it might get the same rows if nothing has changed. you can seed it with the time, if necessary – AwokeKnowing Oct 22 '13 at 16:59
  • well, I believe mysql takes care of that, since this is what it suggests in the manual. again, you can seed it with the time, if necessary – AwokeKnowing Oct 22 '13 at 17:07
  • Was using SQL Server so was getting the same value, but the time seed is good. And I like the cleanness of the approach. – asantaballa Oct 22 '13 at 17:42
2

This appears to be basically a duplicate of SQL Server Random Sort which is basically a duplicate of How to request a random row in SQL?.

The latter has a comprehensive answer for multiple RDBMSs referencing this post:

SQL to Select a random row from a database table

An answer for Microsoft SQL Server would be:

SELECT TOP 10 * FROM table ORDER BY NEWID();

This will not perform well on large tables. It scans the entire table, generating a unique number (a 16-bit GUID) for each row, and then sorts the results by that unique number.

Simply ordering by RAND() in SQL Server will not result in a random list of records. RAND() is evaluated once at the beginning of the statement, so you are effectively ordering by a constant, which isn't really ordering at all. You'll get the same results without the ORDER BY. Indeed, in my instance of SQL Server 2005, the query plans and results were the same with and without the ORDER BY RAND().

RAND() in SQL Server takes a seed value, so you might think that you could pass a varying table column value into the RAND function and get random results. In some sense, you can. You could pass an IDENTITY or other unique column into the RAND function and you won't get the same order as without. And the order will be random in the sense that it appears so to a casual observer. But it's repeatable. The RAND() function in SQL Server will always return the same value for the same seed on the same connection:

"For one connection, if RAND() is called with a specified seed value, all subsequent calls of RAND() produce results based on the seeded RAND() call."

http://technet.microsoft.com/en-us/library/ms177610.aspx

So while you would get what appeared to be a random list, if you executed it multiple times in the same connection, you would get the same list. Depending on your requirements, that might be good enough.

Based on my limited tests on a small table, the RAND with a unique column seed had a very slightly lower estimated query cost.

Community
  • 1
  • 1
Riley Major
  • 1,904
  • 23
  • 36