8

Background : I was trying to get some random 'hex' values while creating dummy data and came up with this construction :

SELECT TOP 100 
   result = (CASE ABS(Binary_Checksum(NewID())) % 16 
                WHEN -1 THEN 'hello'
                WHEN 0 THEN '0' 
                WHEN 1 THEN '1' 
                WHEN 2 THEN '2' 
                WHEN 3 THEN '3'
                WHEN 4 THEN '4' 
                WHEN 5 THEN '5' 
                WHEN 6 THEN '6' 
                WHEN 7 THEN '7'
                WHEN 8 THEN '8' 
                WHEN 9 THEN '9' 
                WHEN 10 THEN 'a' 
                WHEN 11 THEN 'b'
                WHEN 12 THEN 'c' 
                WHEN 13 THEN 'd' 
                WHEN 14 THEN 'e' 
                WHEN 15 THEN 'f' 
                ELSE 'huh'  END)
          FROM sys.objects 

When running this on my SQL Server 2008 R2 instance, I get quite a lot of 'huh' records:

result
------
huh
3
huh
huh
6
8
6

I really don't understand why. What I would expect to happen is :

  • for every record NewID() comes up with a new random value
  • Binary_Checksum() calculates an int based on said value
  • ABS() makes the value positive
  • % 16 returns the remainder of that positive value if it would be divided by 16, which then would be a value between 0 and 15
  • the CASE construction converts the value to a relevant character
  • Since there are WHENs for every value between 0 and 15, the ELSE should never be needed

or at least, that is what I would think should happen... but obviously something goes wrong along the road...

When doing the same thing in a two-step approach (via temp-table), the huh's are gone...

SELECT TOP 100 x = ABS(Binary_Checksum(NewID())) % 16,
               result = 'hello'
  INTO #test
  FROM sys.objects

UPDATE #test 
   SET result = (CASE x WHEN 0 THEN '0' WHEN 1 THEN '1' WHEN 2 THEN '2' WHEN 3 THEN '3'
                        WHEN 4 THEN '4' WHEN 5 THEN '5' WHEN 6 THEN '6' WHEN 7 THEN '7'
                        WHEN 8 THEN '8' WHEN 9 THEN '9' WHEN 10 THEN 'a' WHEN 11 THEN 'b'
                        WHEN 12 THEN 'c' WHEN 13 THEN 'd' WHEN 14 THEN 'e' WHEN 15 THEN 'f' 
                        ELSE 'huh'  END)

SELECT * FROM #test

Anyone who understands this ? As far as I can tell it should give the same result (it IS copy-paste indeed) regardless of me doing it directly or via a temp-table... But obviously something goes wrong if I do it in a single statement.

PS: I don't need a 'fix' for this, I already have a workaround (see below), I merely am hoping someone can explain me why this does what it does.

Workaround :

SELECT TOP 100 result = SubString('0123456789abcdef', 1 + (ABS(Binary_Checksum(NewID())) % 16), 1) 
  FROM sys.objects
Jon Egerton
  • 40,401
  • 11
  • 97
  • 129
deroby
  • 5,902
  • 2
  • 19
  • 33
  • PS: if anyone could test this on different versions of SQL-Server that might be interesting too... – deroby Dec 15 '11 at 09:50
  • Your workaround looks more succinct than the original... – Russell Dec 15 '11 at 09:54
  • Thx for explaining this guys...going with Damiens answer as it is a bit more verbose for anyone who might stumble upon this. (and he was 1 minute faster on top =P ) – deroby Dec 15 '11 at 10:07

2 Answers2

8

The compute scalar in the plan has the following formula

[Expr1038] = Scalar Operator(CASE WHEN abs(binary_checksum(newid()))%(16)=(-1) THEN 'hello' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(0) THEN '0' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(1) THEN '1' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(2) THEN '2' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(3) THEN '3' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(4) THEN '4' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(5) THEN '5' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(6) THEN '6' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(7) THEN '7' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(8) THEN '8' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(9) THEN '9' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(10) THEN 'a' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(11) THEN 'b' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(12) THEN 'c' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(13) THEN 'd' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(14) THEN 'e' ELSE CASE WHEN abs(binary_checksum(newid()))%(16)=(15) THEN 'f' ELSE 'huh' END END END END END END END END END END END END END END END END END)

The random number is repeatedly re-evaluated rather than being evaluated once and kept constant throughout each branch of the CASE statement.

The (fixed) proposed solution in Damien's answer does work for me

SELECT TOP 100 
    result = (CASE ABS(Binary_Checksum(Value)) % 16 
            WHEN -1 THEN 'hello'
            /*...*/
            ELSE 'huh'  END)
          FROM (select NewID() as Value,* from sys.objects ) so

Because the plan has 2 compute scalar operators. The first one with definition

[Expr1038] = Scalar Operator(newid())

Plan

Then that constant expression Expr1038 is fed into the CASE expression. I'm not sure that this behavior is absolutely guaranteed however. It may be subject to the whims of the optimiser.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • LOL, that would explain it indeed. So internally a CASE x WHEN 1 THEN a WHEN 2 THEN b .... is converted to CASE WHEN x = 1 THEN a WHEN x = 2 THEN b etc... Makes sense. – deroby Dec 15 '11 at 09:59
  • @deroby - Yep. Pretty much. Not sure there is a 100% guaranteed alternative to avoid this. – Martin Smith Dec 15 '11 at 10:03
  • I guess using a CTE, CROSS APPLY or CROSS JOIN one could work around this issue; but like someone already pointed out, the given work-around actually looks cleaner =) – deroby Dec 15 '11 at 10:17
  • 3
    @deroby - I added a bit to my answer about that. It **appears** to work but I doubt you'll find any cast iron guarantees that it will always work. SQL Server generally reserves the right to re-order expressions in the tree as it sees fit [as explained well in this answer](http://stackoverflow.com/questions/5191701/tsql-divide-by-zero-encountered-despite-no-columns-containing-0/5203211#5203211) – Martin Smith Dec 15 '11 at 10:22
  • If only I could accept 2 answers... I'll up-vote your answer so you both get something out of this =) Anyway, you're probably right that the whole thing never will be 100% reliable due to its very nature. Only more reason to stick to the work-around. PS: if really, really needed, we might try to `USE PLAN `, but even I think that would be overkill for the problem at hand and even there I wonder how secure it would be in the end. I guess the lesson at hand is to be very careful with non-deterministic functions... – deroby Dec 15 '11 at 10:37
  • 2
    @deroby - You can't use a plan guide to force compute scalars unfortunately. Any compute scalars in the XML plan are ignored and you may still end up with a different plan. – Martin Smith Dec 15 '11 at 10:39
3

I believe that, contrary to the description of the simple CASE expression, that it actually re-evaluates input_expression for each input_expression = when_expression comparison (this would normally be safe, unless, as in this case, there's a non-deterministic function in input_expression)

So, what happens is that it keeps generating different random numbers between 0 and 15 for each comparison, and the huhs come out if, after 16 evaluations/comparisons, it never generated a matching number.


This doesn't generate huhs:

SELECT TOP 100 
    result = (CASE ABS(Binary_Checksum(Value)) % 16 
            WHEN -1 THEN 'hello'
            WHEN 0 THEN '0' 
            WHEN 1 THEN '1' 
            WHEN 2 THEN '2' 
            WHEN 3 THEN '3'
            WHEN 4 THEN '4' 
            WHEN 5 THEN '5' 
            WHEN 6 THEN '6' 
            WHEN 7 THEN '7'
            WHEN 8 THEN '8' 
            WHEN 9 THEN '9' 
            WHEN 10 THEN 'a' 
            WHEN 11 THEN 'b'
            WHEN 12 THEN 'c' 
            WHEN 13 THEN 'd' 
            WHEN 14 THEN 'e' 
            WHEN 15 THEN 'f' 
            ELSE 'huh'  END)
          FROM (select NewID() as Value,* from sys.objects ) so
Johan
  • 1,152
  • 7
  • 16
Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • So it seems. Makes sense once you guys explain it but I wouldn't have thought about it on my own. Thx. – deroby Dec 15 '11 at 10:02
  • 1
    "This doesn't generate huhs" in your test but can you point to any docs saying SQL Server is guaranteed to only evaluate it once? – Martin Smith Dec 15 '11 at 10:04
  • 2
    That should be: `result = (CASE ABS(Binary_Checksum(Value)) % 16` – Johan Dec 15 '11 at 10:05
  • @Damien_The_Unbeliever : Your query also generates "huhs". – Upendra Chaudhari Dec 15 '11 at 10:10
  • @MartinSmith - I certainly found it to be *implied* by this phrase in the description of simple case: "Evaluates `input_expression`, and then in the order specified, evaluates `input_expression = when_expression` for each WHEN clause." – Damien_The_Unbeliever Dec 15 '11 at 10:52
  • @UpendraChaudhari - sorry, (@JOhan has fixed). It was right first, but had the very wide formatting from the original question. When I C&Ped the revised formatting (each `WHEN` on a new line), I inadvertently reverted `Value` to `NewID()` also. – Damien_The_Unbeliever Dec 15 '11 at 10:54
  • @Damien_The_Unbeliever - But `input_expression` is still based on a scalar function (albeit pushed into a table expression) and [see the comments section here](http://connect.microsoft.com/SQLServer/feedback/details/350485/bug-with-newid-and-table-expressions) for some interesting discussion about those. Although the plan does indicate SQL Server might make attempts to constify non deterministic functions I haven't seen any documentation about when this will occur. – Martin Smith Dec 15 '11 at 11:05
  • @MartinSmith - it's the first clause of the quote that makes it sound like it should be evaluated once. Why have that first clause ("Evaluates `input_expression`"), if `input_expression` is going to be re-evaluated for each `input_expression = when expression` comparison? – Damien_The_Unbeliever Dec 15 '11 at 11:21
  • @Damien_The_Unbeliever - But the example in the OP shows that this interpretation is not correct and that `input_expression` can indeed be repeatedly re-evaluated as the first part of your answer states. – Martin Smith Dec 15 '11 at 11:23
  • @MartinSmith - exactly, I'd say the documentation gives a misleading impression, by having that clause. – Damien_The_Unbeliever Dec 15 '11 at 11:26