0

In SQL Server all I want to do is select one field and assign a random number to each row. There will be over 1M rows, and I will then order by the random numbers to select the top 50K as a random sample.

If i do:

Select x, rand()*(50000-1)+1) 
From Y 

I get all of the x that I want, but they each have the same random number.

  • Just remember to take out the parenthesis after "+1" if you're not using FLOOR or something else when you're generating your number. – Dresden Apr 27 '16 at 18:00

2 Answers2

1

You probably don't need to assign any values, just order by a pseudo-random value and take the TOP:

SELECT TOP 50000 X FROM Y ORDER BY NEWID()
Cᴏʀʏ
  • 105,112
  • 20
  • 162
  • 194
  • Thank you. Can you briefly explain how this works? NewID is new to me – user6114185 Apr 27 '16 at 18:04
  • @user6114185: `NEWID()` returns a GUID, so a new GUID is assigned temporary on each row so the `ORDER BY` works. Since GUIDs are, for all intents and purposes, guaranteed to never repeat, and they aren't generated sequentially anyway, we can basically trust that it will be random enough for you. – Cᴏʀʏ Apr 27 '16 at 18:06
0

If you only use the random number to sort randomly, you could instead skip the random number field and use

ORDER BY NEWID()
Frode
  • 3,325
  • 1
  • 22
  • 32