13

How do we combine How to request a random row in SQL? and Multiple random values in SQL Server 2005 to select N random rows using a single pure-SQL query? Ideally, I'd like to avoid the use of stored procedures if possible. Is this even possible?

CLARIFICATIONS:

  1. Pure SQL refers to as close as possible to the ANSI/ISO standard.
  2. The solution should be "efficient enough". Granted ORDER BY RAND() might work but as others have pointed out this isn't feasible for medium-sized tables.
Community
  • 1
  • 1
Gili
  • 86,244
  • 97
  • 390
  • 689

5 Answers5

7

The answer to your question is in the second link there:

SELECT * FROM table ORDER BY RAND() LIMIT 1

Just change the limit, and/or rewrite for SQL Server:

SELECT TOP 1 * FROM table ORDER BY newid()

Now, this strictly answers your question, but you really shouldn't be using this solution. Just try it on a large table and you'll see what I mean.

If your key-space is sequential, either without holes, or with very few holes, and if it has very few holes, you're not too concerned that some rows have a slightly higher chance of being picked than others, then you can use a variation where you calculate which key you want to retrieve randomly, ranging from 1 to the highest key in your table, and then retrieve the first row that has a key equal to or higher than the number you calculated. You only need the "higher than" part if your key-space has holes.

This SQL is left as an excercise for the reader.


Edit: Note, a comment to another answer here mentions that perhaps pure SQL means ANSI standard SQL. If that is the case, then there is no way, since there is no standardized random function, nor does every database engine treat the random number function the same way. At least one engine I've seen "optimizes" the call by calling it once and just repeating the calculated value for all rows.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • NEWID() is a bad idea if you want truly random samples, GUIDs have a lot of structure. If you don't care about being really random though, go ahead. – user12861 Dec 31 '08 at 14:05
2

Here's a potential solution, that would let you balance the risk of getting less than N rows against a sampling bias from the "front" of the table. This assumes that N is small compared to the size of the table:

select * from table where random() < (N / (select count(1) from table)) limit N;

This will generally sample most of the table, but can return less than N rows. If having some bias is acceptable, the numerator can be changed from N to 1.5*N or 2*N to make it very likely that N rows will be returned. Additionally, if it's necessary to randomize the row order, not just select a random subset:

select * from (select * from table
                where random() < (N / (select count(1) from table)) limit N)
 order by mod(tableid,1111);

The downside of this solution is that, at least in PostgreSQL, it uses a sequential scan of the table. A larger numerator will speed up the query.

mtillberg
  • 21
  • 1
2

I don't know about pure ANSI, and it's not simple, but you can check out my answer to a similar question here: Simple Random Samples from a Sql database

Community
  • 1
  • 1
user12861
  • 2,358
  • 4
  • 23
  • 41
  • It's not clear to me how to implement what you propose if assumption #3 is false (that is, your table has holes). – Gili Dec 31 '08 at 00:21
  • You have to rewrite the whole table so assumption #3 is true, so it's a very slow O(n) operation. Create a new table with the same columns as the original table, and also an identity column for a new primary key that will have no gaps. Then insert the whole original table into the new one. – user12861 Dec 31 '08 at 14:04
-1

That's may help you:

SELECT TOP 3 * FROM TABLE ORDER BY NEWID()
dipi evil
  • 489
  • 1
  • 11
  • 20
  • -1, already covered by http://stackoverflow.com/a/396946/14731 and isn't Pure SQL (newid() is Microsoft-specific). – Gili Apr 18 '13 at 13:14
-2

Using below code you can achieve the same you are looking for..

select top 1 * from student1 order by newid()

change value of N where top 1 so you will receive that number of random records.

MarmiK
  • 5,639
  • 6
  • 40
  • 49
killing007
  • 11
  • 2
  • -1, already covered by stackoverflow.com/a/396946/14731 and isn't Pure SQL (newid() is Microsoft-specific). – Gili Apr 12 '14 at 19:07