0

I wonder if someone knows how can I generate in Sql Server random values within a range uniformly distributed. This is what I did:

SELECT ID, AlgorithmType, AlgorithmID
FROM TEvaluateAlgorithm

I want AlgorithmID takes values from 0 to 15, has to be uniformly distributed

UPDATE TEA SET TEA.AlgorithmID = FLOOR(RAND(CONVERT(VARBINARY, NEWID()))*(16))
-- FROM TEvaluateAlgorithm TEA

I do not know what happen with the random, but is not distributing uniform random values between 0 and 15, not with the same amount. For example from 0 to 9 is greater than from 10 to 15.

Thanks in advance!

EDITED:

Here is my data you can see the difference...

AlgorithmID COUNT(*)

    0   22254
    1   22651
    2   22806
    3   22736
    4   22670
    5   22368
    6   22690
    7   22736
    8   22646
    9   22536
    10  14479
    11  14787
    12  14553
    13  14546
    14  14574
    15  14722
user2112420
  • 955
  • 5
  • 11
  • 26

2 Answers2

2

rand() doesn't do a good job with this. Because you want integers, I would suggest the following:

select abs(checksum(newid()) % 16

I just checked this using:

select val, count(*)
from (select abs(checksum(newid()) % 16
      from master..spt_values
     ) t
group by val
order by val;

and the distribution looks reasonable.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I have read this post, http://stackoverflow.com/questions/1045138/how-do-i-generate-random-number-for-each-row-in-a-tsql-select he says there is a very slight bias in the code. – user2112420 Jul 03 '15 at 07:27
0

Here's a quick proof of concept.

Set @Loops to something big enough to make the statistics meaningful. 50k seems like a decent starting point.

Set @MinValue to the lowest integer in your set and set @TotalValues to how many integers you want in your set. 0 and 16 get you the 16 values [0-15], as noted in the question.

We're going to use a random function to cram 50k outputs into a temp table, then run some stats on it...

DECLARE @MinValue int
DECLARE @TotalValues int

SET @MinValue = 0
SET @TotalValues = 16

DECLARE @LoopCounter bigint
SET @LoopCounter = 0

DECLARE @Loops bigint
SET @Loops = 50000

CREATE TABLE #RandomValues
(
    RandValue int
)

WHILE @LoopCounter < @Loops
    BEGIN

        INSERT INTO #RandomValues (RandValue) VALUES (FLOOR(RAND()*(@TotalValues-@MinValue)+@MinValue))
        --you can plug into the right side of the above equation any other randomize formula you want to test
        SET @LoopCounter = @LoopCounter + 1

    END


--raw data query
SELECT
    RandValue AS [Value],
    COUNT(RandValue) AS [Occurrences],
    ((CONVERT(real, COUNT(RandValue))) / CONVERT(real, @Loops)) * 100.0 AS [Percentage]
FROM
    #RandomValues
GROUP BY
    RandValue
ORDER BY
    RandValue ASC

--stats on your random query

SELECT
    MIN([Percentage]) AS [Min %],
    MAX([Percentage]) AS [Max %],
    STDEV([Percentage]) AS [Standard Deviation]
FROM
    (   
    SELECT
        RandValue AS [Value],
        COUNT(RandValue) AS [Occurrences],
        ((CONVERT(real, COUNT(RandValue))) / CONVERT(real, @Loops)) * 100.0 AS [Percentage]
    FROM
        #RandomValues
    GROUP BY
        RandValue
    --ORDER BY
    --  RandValue ASC
    ) DerivedRawData

DROP TABLE #RandomValues

Note that you can plug in any other randomizing formula into the right side of the INSERT statement within the WHILE loop then re-run to see if you like the results better. "Evenly distributed" is kinda subjective, but the standard deviation result is quantifiable and you can determine if it is acceptable or not.

LDMJoe
  • 1,591
  • 13
  • 17
  • No feedback, just the downvote? Exactly why is this not helpful in proving or disproving how "uniformly distributed" a random function is behaving? – LDMJoe Jul 02 '15 at 18:37