5

According to this post, the correct way of generating a random value in SQL Server is:

ABS(CHECKSUM(NewId())) % 14   -- Returns a value between 0 and 13

However, when using this expression in a case statement, such as:

SELECT
    CASE ABS(CHECKSUM(NEWID())) % 4 
        WHEN 0 THEN 'String A' 
        WHEN 1 THEN 'String B' 
        WHEN 2 THEN 'String C' 
        WHEN 3 THEN 'String D'
    END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM sys.all_objects

The outputtet RandomString column contains some NULLs, as can be seen in this SQL fiddle. I have found that I can wrap the randomizing expression in a CTE to avoid NULLs in the output, but I still wonder why the code above returns NULLs?

WITH RandomNumber AS (
    SELECT ABS(CHECKSUM(NEWID())) % 4 AS N FROM sys.all_objects
)
SELECT TOP 100
    CASE N
        WHEN 0 THEN 'String A' 
        WHEN 1 THEN 'String B' 
        WHEN 2 THEN 'String C' 
        WHEN 3 THEN 'String D'
    END AS RandomString -- Does not return any NULLs. Only String A, B, C and D.
FROM RandomNumber

I have tried to generate the random number using a slightly different method, but the result is the same:

CAST(RAND(CHECKSUM(NEWID())) * 4 AS INT)  -- Returns a value between 0 and 3

This seems to be a problem on SQL Server 2014, I have not tested it on other versions.

Community
  • 1
  • 1
Dan
  • 10,480
  • 23
  • 49
  • For better quality random numbers you might want to consider [`CRYPT_GEN_RANDOM`](https://msdn.microsoft.com/en-us/library/cc627408.aspx?f=255&MSPPError=-2147217396). – HABO May 05 '15 at 14:21

4 Answers4

6

It generates NULLs because there's no guarantee on how many times a particular expression will be evaluated.

What you wanted was for SQL Server to do something along the lines of:

let x = GenerateRandomNumber()
if x = 1 then 'String 1'
if x = 2 then 'String 2'
if x = 3 then 'String 3'
if x = 4 then 'String 4'

(Where GenerateRandomNumber() is ABS(CHECKSUM(NEWID())) % 4); but what SQL Server actually did is:

if GenerateRandomNumber() = 1 then 'String 1'
if GenerateRandomNumber() = 2 then 'String 2'
if GenerateRandomNumber() = 3 then 'String 3'
if GenerateRandomNumber() = 4 then 'String 4'

So you only get a non-NULL result if you happen to pick the right random number for one particular comparison operation.

I think that even with the CTE, there's no guarantee that SQL Server won't ever generate something like the second code block above. If you want a stable, generated once random number, you need to arrange to store that value somewhere (e.g. in a table variable or temp table).


The reason why I focus on guarantees is that you don't want to end up writing code based on current observed behaviour. There were a lot of "issues" reported when SQL Server 2008 stopped "ordering" results in views that we're using the TOP 100 PERCENT ... ORDER BY trick - something that happened to work (mostly) on 2005 and earlier versions but ceased to do so.

Similarly, if someone asked my for an expression that returns the number 5, I could supply them the expression DATEPART(day,GETUTCDATE()) and let them run as many queryies over as many rows as they like - for the next 8 hours - but that doesn't mean that I'd recommend it as a solution to their problem.

And also, we know that there are areas where SQL Server's decisions about evaluation order can be surprising.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
6

The expression is being rewritten internally to the same logic as

CASE WHEN ABS(CHECKSUM(NEWID())) % 4 = 1 THEN x 
     WHEN ABS(CHECKSUM(NEWID())) % 4 = 2 THEN y 
     WHEN ...

This is why the syntax in the question will never work. The expression is being evaluated for each WHEN.

The way you can fix it is:

SELECT 
    CASE x % 4 
        WHEN 0 THEN 'String A' 
        WHEN 1 THEN 'String B' 
        WHEN 2 THEN 'String C' 
        WHEN 3 THEN 'String D'
    END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM sys.all_objects
CROSS APPLY (SELECT ABS(CHECKSUM(NEWID())) x) y

Using CROSS APPLY will only calculate it once

t-clausen.dk
  • 43,517
  • 12
  • 59
  • 92
  • 1
    I'm not sure whether even `CROSS APPLY` is *guaranteed* to evaluate an expression exactly once. [This question](http://dba.stackexchange.com/questions/84464/reuse-a-complex-non-deterministic-expression-in-select-and-where-clauses) seems related. – Damien_The_Unbeliever May 05 '15 at 13:36
  • @Damien_The_Unbeliever x is calculated once, try SELECT x,x,x instead. this will always same value – t-clausen.dk May 05 '15 at 13:40
  • I could run your test 10000 times and yes, observe, that the value is always the same. But then on the 10001st time, I might get a different result, for whatever reason (bugfix/service pack, time of day, load on server, etc). That's why I focus on *guarantees* rather than on *observed behaviour*. After all, SQL Server *could* have taken the OPs original query and behaved how they expected it to work - would you then say the OPs original query was correct? – Damien_The_Unbeliever May 05 '15 at 13:43
  • I tried on a table with 100.000.000 rows. Can provide the script if needed. All the results gave the same values for x,x,x – t-clausen.dk May 05 '15 at 13:52
  • 1
    But again, that's *observed behaviour*, not *guaranteed behaviour*. If it was *documented* to evaluate each expression once, I wouldn't be raising this. But that's the point, it is *not* so documented. – Damien_The_Unbeliever May 05 '15 at 13:56
  • 1
    @Damien_The_Unbeliever But it just would not make sense to read that expression multiple times. It would just go against data consistency and efficiency. A join to a table will take a read lock to assure consistent data. The select statement processes after all joins and wheres. http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/ – paparazzo May 05 '15 at 15:14
  • @Damien_The_Unbeliever I saw you link that a select processed before a where so I just that is not a string processing order. But I hold with it would just not make sense for it to process that cross apply expression multiple times. It would violate a transaction in my mind. – paparazzo May 05 '15 at 15:24
  • @Blam - the argument from efficiency would be that no expression would be evaluated more than once. But, clearly, given the OPs question where they've observed different behaviour, this is not the case. – Damien_The_Unbeliever May 05 '15 at 17:11
  • @Damien_The_Unbeliever That expression and any expression are not the same. I specifically address join (apply). Select is another thing. – paparazzo May 05 '15 at 17:19
0

Well as far as why, I'm not really sure. But as far as your null problem, making it a derived table and querying that worked. Don't really know why.

SELECT  CASE rand_num
            WHEN 0 THEN 'String A' 
            WHEN 1 THEN 'String B' 
            WHEN 2 THEN 'String C' 
            WHEN 3 THEN 'String D'
    END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM
(
    SELECT ABS(CHECKSUM(NEWID())) % 4 rand_num
    FROM sys.all_objects
) A
WHERE rand_num BETWEEN 0 AND 3
Stephan
  • 5,891
  • 1
  • 16
  • 24
  • Again, as with the CTE, and as with the `CROSS APPLY`, whilst things may *seem* to be stable I'm not sure that there's a *guarantee* that the random expression will not be evaluated multiple times when considering the `CASE` expression. – Damien_The_Unbeliever May 05 '15 at 13:37
  • Well a where clause will guarantee no NULL values. As far as accurate randomization, I didn't check for that. – Stephan May 05 '15 at 13:43
  • 1
    Thanks @Stephan for your input. Regarding accurate randomization I would suspect that if the randomizing expression is evaluated once for each `WHEN` then 'String A' would appear slightly more often than 'String B', 'String B' more often than 'String C' etc. – Dan May 05 '15 at 14:14
0

This is a well thought out discussion on random numbers and nulls. The best answer and explanation I've seen on this problem is from Aaron Bertrand's article, "Dirty Secrets of the CASE Expression," which is published at sqlperformance.com. In my situation, I combined Query D with CROSS APPLY to get the results I was looking for.

enter image description here

datalifenyc
  • 2,028
  • 1
  • 20
  • 18