0

I'm having a problem with random values being generated for each row in a result set in SQL Server 2008. I found a similar question here, but upon implementing the proposed answer, I saw the same problem as before. When running the query I have provided below, it seems that the same values will sometimes show up in consecutive rows, even though I'm calling for a new NEWID() with each row.

DECLARE @Id int = 0
DECLARE @Counter int = 1
DECLARE @Value int

CREATE TABLE #Table1
(
id int identity(1,1)
,Value int
)

WHILE @Counter < 100000
BEGIN
    INSERT INTO #Table1 (Value)
    SELECT CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT)
    SET @Counter += 1
END

SET @Counter = 0
WHILE @Counter < 5
BEGIN
    SELECT
        @Value = T.Value
        ,@Id = T.id
    FROM #Table1 T
    WHERE T.id = CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT) + 1 + @Counter

    IF @Id <> 0
        SELECT @Value AS Value ,@Id as ID
    SET @Counter += 1
END
DROP TABLE #Table1

If I change the INT to a BIGINT, as suggested in the link I provided, nothing is solved, so I don't believe that it's an "overflow" issue.

Community
  • 1
  • 1
whobetter
  • 65
  • 6
  • Edit: Actually, I'm adding the counter to the `WHERE` statement here, so it can't be an issue with generating the random number, but rather with updating the variables, am I correct? – whobetter May 24 '13 at 16:25
  • You never set @Id back to 0 after displaying it, so it will display again on the next iteration of the loop if nothing is found. – τεκ May 24 '13 at 16:31
  • This is not the case. The double rows happen with Ids and Values other than 0. Also, @Id is updated via the select statement during each iteration, regardless of whether or not it is displayed. – whobetter May 24 '13 at 16:40

1 Answers1

0

If I take the calculation out of the select, I don't get the doubled rows:

DECLARE @Id int = 0
DECLARE @Counter int = 1
DECLARE @Value int
-- new variable
DECLARE @DERIVED INT 

CREATE TABLE #Table1
(
id int identity(1,1)
,Value int
)

WHILE @Counter < 100000
BEGIN
    INSERT INTO #Table1 (Value)
    SELECT CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT)
    SET @Counter += 1
END

SET @Counter = 0
WHILE @Counter < 5
BEGIN
--set here to remove calculation from the select
    SET @DERIVED = CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT) + 1 + @Counter;
    SELECT
        @Value = T.Value
        ,@Id = T.id
    FROM #Table1 T
    WHERE T.id = @DERIVED

    IF @Id <> 0
        SELECT @Value AS Value ,@Id as ID;
    SET @Counter += 1
END
DROP TABLE #Table1

I'm seeing the duplicates every time with the pseudorandom generator inside the select. Oddly enough, I get about the same frequency of duplicates on the insert loop whether or not the calculation is inside the insert... select. It could be coincidence, since we are dealing with a randomly selected number. Note also, that since you're adding to the pseudorandom result, the results aren't technically duplicates. They're descending sequences:

11111 + 1 + 1 = 11113
11110 + 1 + 2 = 11113

Same overall result, different pseudorandom result. However, if I change

CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT) + 1 + @Counter

to

CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT) + @Counter + @Counter

I still consistently get duplicates. That implies that the optimizer may be caching/re-using values, at least on the select. I'd call that improper for a non-deterministic function call. I get similar results on 10.0.1600 and 10.50.1600 (2008 RTM and 2008R2 RTM).

JAQFrost
  • 1,431
  • 8
  • 8
  • Jason, thanks for your effort. I see the same results you do. I also was able to eliminate duplicates by setting `@Id` and `@Value NULL` after incrementing the counter. Thanks again. – whobetter May 31 '13 at 14:32