0

I'm trying to generate some randomized data, and I've been using newid() to seed functions since it is called once for every row and is guaranteed to return a different result each time. However I'm frequently getting values that are somehow not equal to any integers in the expected range.

I've tried a few variations, including a highly upvoted one, but they all result in the same issue. I've put it into a script that shows the problem:

declare @test table (id uniqueidentifier)
insert into @test
select newid() from sys.objects

select 
    floor(rand(checksum(id)) * 4),
    case isnull(floor(rand(checksum(id)) * 4), -1)
        when 0 then 0
        when 1 then 1
        when 2 then 2
        when 3 then 3
        when -1 then -1
        else 999
    end,
    floor(rand(checksum(newid())) * 4),
    case isnull(floor(rand(checksum(newid())) * 4), -1)
        when 0 then 0
        when 1 then 1
        when 2 then 2
        when 3 then 3
        when -1 then -1
        else 999
    end
from @test

I expect the results to always be in the range 0 to 3 for all four columns. When the unique identifiers are retrieved from a table, the results are always correct (first two columns.) Similarly, when they're output on the fly they're also correct (third column.) But when they're compared on the fly to integers in a case statement, it often returns a value outside the expected range.

Here's an example, these are the first 20 rows when I ran it just now. As you can see there are '999' instances in the last column that shouldn't be there:

0   0   3   1
3   3   3   1
0   0   3   3
3   3   2   999
1   1   2   999
3   3   2   1
2   2   0   999
0   0   0   0
3   3   2   0
1   1   3   999
3   3   0   999
2   2   2   2
1   1   3   0
2   2   3   0
3   3   1   999
0   0   1   999
3   3   1   1
0   0   0   3
3   3   0   999
0   0   1   0

At first I thought maybe the type coercion was different than I expected, and the result of rand() * int was a float not an int. So I wrapped it all in floor to force it to be an int. Then I thought perhaps there's an odd null value creeping in, but with my case statement a null would be returned as -1, and there are none.

I've run this one two different SQL Server 2012 SP1 instances, both give the same sort of results.

Community
  • 1
  • 1
Rick
  • 4,575
  • 1
  • 26
  • 20
  • Before anyone asks: newid() returns a guid, checksum() calculates a checksum and returns a value between minInt and maxInt, rand() returns a random float value between 0 and 1, floor() returns an int nearest to but less than the parameter it was given. They're standard functions in SQL Server, not anything I've created. – Rick Oct 02 '15 at 23:44

1 Answers1

2

In the fourth column, isnull(floor(rand(checksum(newid())) * 4), -1) is being evaluated up to five times for each row. Once for each branch of the case. On each call the values can be different. So it can return 2, not match 1, 3 not match 2, 1 not match 3, 3 not match 4 fall to the else and return 999.

This can be seen if you get the execution plan, and look at the XML, there is a line [whitespace added.]:

<ScalarOperator ScalarString="
CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(0.000000000000000e+000) THEN (0) 
    ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(1.000000000000000e+000) THEN (1) 
        ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(2.000000000000000e+000) THEN (2) 
            ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(3.000000000000000e+000) THEN (3) 
                ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(-1.000000000000000e+000) THEN (-1) 
                    ELSE (999) 
                END 
            END 
        END 
    END 
END
">

Placing the expression in a CTE seems to keep the recomputes from happening:

; WITH T AS (SELECT isnull(floor(rand(checksum(newid())) * 4), -1) AS C FROM @Test)
SELECT CASE C
        when 0 then 0
        when 1 then 1
        when 2 then 2
        when 3 then 3
        when -1 then -1
        else 999 END
FROM T
Shannon Severance
  • 18,025
  • 3
  • 46
  • 67
  • Well spotted, thanks! Doesn't show up in the graphical execution plan except to say it's evaluating expressions. I guess I should look at the text version instead from now on! – Rick Oct 03 '15 at 00:36