4

I'm trying to select a random 10% sampling from a small table. I thought I'd just use the RAND() function and select those rows where the random number is less than 0.10:

SELECT * FROM SomeTable
WHERE SomeColumn='SomeCondition' AND
      RAND() < 0.10

But I soon discovered that RAND() always returns the same number! Reminds me of this xkcd cartoon.

OK, no problem, the RAND function takes a seed value. I will be running this query periodically, and I want it to give different results if I run it on a different day, so I seed it with a combination of the date and a unique row ID:

SELECT * FROM SomeTable
WHERE SomeColumn='SomeCondition' AND
      RAND(CAST(GETDATE) AS INTEGER) + RowID) < 0.10

I still don't get any results! When I show the random numbers returned by RAND, I discover that they're all within a narrow range. It appears that getting a random number from RAND requires you to use a random seed. If I had a random seed in the first place, I wouldn't need a random number!

I've seen the previous discussions related to this problem:

SQL Server Random Sort
How to request a random row in SQL?

They don't help me. TABLESAMPLE works at the page level, which is great for a big table but not for a small one, and it looks like it applies prior to the WHERE clause. TOP with NEWID doesn't work because I don't know ahead of time how many rows I want.

Anybody have a solution, or at least a hint?

Edit: Thanks to AlexCuse for a solution which works for my particular case. Now to the larger question, how to make RAND behave?

Community
  • 1
  • 1
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622

5 Answers5

6

This type of approach (shown by ΤΖΩΤΖΙΟΥ) will not guarantee a 10% sampling. It will only give you all rows where Rand() is evaluated to < .10 which will not be consistent.

Something like

select top 10 percent * from MyTable order by NEWID()

will do the trick.

edit: there is not really a good way to make RAND behave. This is what I've used in the past (kludge alert - it kills you not being able to use Rand() in a UDF)

CREATE VIEW RandView AS 

SELECT RAND() AS Val

GO

CREATE FUNCTION RandomFloat()
RETURNS FLOAT
AS
BEGIN

RETURN (SELECT Val FROM RandView)

END

Then you just have select blah, dbo.RandomFloat() from table in your query.

AlexCuse
  • 18,008
  • 5
  • 42
  • 51
  • An approximation to 10% was good enough for me, but your answer solves my immediate problem nicely. I should have thought to check for a PERCENT clause on TOP. – Mark Ransom Oct 02 '08 at 18:42
2

If your table has a column (perhaps even the rowid column) that is numeric in the general sense, like integer, floating point or SQL numeric, please try the following:

SELECT * FROM SomeTable WHERE SomeColumn='SomeCondition' AND 0*rowid+RAND() < 0.10

In order to evaluate RAND() once for every row, not once at the start of your query.

The query optimizer is to blame. Perhaps there is another way, but I believe this will work for you.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
tzot
  • 92,761
  • 29
  • 141
  • 204
  • 1
    That method isn't working for me. I only have varchar and int, and I don't understand why int would behave different than numeric. – Mark Ransom Oct 02 '08 at 17:48
  • If it's the optimizer then it's very much like the cartoon! – Ken Oct 02 '08 at 17:51
  • Perhaps it's my fault, but saying "numeric" I meant any numeric type (integer, shortint, float, "SQL" numeric etc). Check my answer again, please. – tzot Oct 02 '08 at 17:59
1

This seems to work:

select * from SomeTable
where rand(0*SomeTableID + cast(cast(newid() as binary(4)) as int)) <= 0.10
Jason DeFontes
  • 2,235
  • 15
  • 14
0

Did you see this question?

How do I return random numbers as a column in SQL Server 2005?

Adam posted a UDF you can use in place of Rand() that works much better.

Community
  • 1
  • 1
Joshua Carmody
  • 13,410
  • 16
  • 64
  • 83
0

This seems to work

SELECT TOP 10 PERCENT * FROM schema.MyTable ORDER BY NEWID()
Amarnath Balasubramanian
  • 9,300
  • 8
  • 34
  • 62
Jonas Stensved
  • 14,378
  • 5
  • 51
  • 80