10

I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:

stackoverflow.com/.../update-sql-table-with-random-value-from-other-table

When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.

When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:

  • SQL Server is optimizing the SELECT somehow, not allowing the subquery to be evaluated more than once
  • The random value's seed is the same on every record the query updates

I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?

This is the code I'm using:

DECLARE @randomStuff TABLE ([id] INT, [val] VARCHAR(100))

INSERT INTO @randomStuff ([id], [val]) 
VALUES ( 1,  'Test Value 1' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 2,  'Test Value 2' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 3,  'Test Value 3' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 4,  'Test Value 4' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 5,  'Test Value 5' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 6,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 7,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 8,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 9,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 10, null )

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
Community
  • 1
  • 1
Dan Herbert
  • 99,428
  • 48
  • 189
  • 219

6 Answers6

15

When the query engine sees this...

(SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())

... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"

You need to trick the query engine into thinking it's non-cachable. jfar's answer was close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn, but it ain't smart enough to see through this.

UPDATE MyTable
   SET MyColumn = (SELECT TOP 1 val
                     FROM @randomStuff r
                          INNER JOIN MyTable _MT
                                  ON M.Id = _MT.Id
                    ORDER BY NEWID())
 FROM MyTable M

By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.

A cursor would probably be just as fast, but is most certainly not as fun.

Community
  • 1
  • 1
Alex Papadimoulis
  • 2,720
  • 1
  • 20
  • 24
  • OK, I can't remember if you can do the INNER JOIN like that in SQL Server 2000, but there is a way around it, which I used all the time before going 2005. That was too many years ago for me to remember, though. But this should work in 2005 and later just fine. – Alex Papadimoulis Sep 24 '09 at 06:26
  • 2
    +1 this is fantastic, hower there is a small typo, `ON MT.Id = _MT.Id` should be `ON M.Id = _MT.Id` – Rippo Oct 12 '11 at 08:40
  • As suggested by @Rippo, the full query should be: `UPDATE M SET MyColumn = (SELECT TOP 1 val FROM @randomStuff INNER JOIN MyTable _MT ON M.Id = _MT.Id ORDER BY NEWID()) FROM MyTable M`. I also noticed that `r` is not used and changed to `UPDATE M` instead of `UPDATE MyTable`. +1 for cursors not as fun – m-smith Jul 01 '14 at 11:40
2

use a cross join to generate random data

1

I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.

Once @randomStuff is set up, we do this (note in my case, @MyTable is a table variable, adjust accordingly for your normal table):

DECLARE @randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)

INSERT INTO @randomMappings 
SELECT M.id, val, NEWID() AS sort 
FROM @MyTable AS M 
CROSS JOIN @randomstuff

so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then

DELETE others FROM @randomMappings AS others 
INNER JOIN @randomMappings AS lower 
ON (lower.id = others.id) AND (lower.sorter < others.sorter)

This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:

UPDATE @MyTable
SET MyColumn = random.val
FROM @MyTable m, @randomMappings AS random
WHERE (random.id = m.id)

And you're done!

I said it was hacky...

Cowan
  • 37,227
  • 11
  • 66
  • 65
0

I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.

then you would have

UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()
tster
  • 17,883
  • 5
  • 53
  • 72
0

There is no optimization going on here.

Your using a subquery that selects a single value, there is nothing to optimize.

You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID()
    WHERE MyTable.MyColumn = MyTable.MyColumn )
John Farrell
  • 24,673
  • 10
  • 77
  • 110
0

I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.

In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:

DECLARE @rows INT 
DECLARE @currentRow INT

SELECT @rows = COUNT(*) FROM dbo.MyTable
SET @currentRow = 1

WHILE @currentRow < @rows
BEGIN 

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
 FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
      FROM MyTable a) AS b
 WHERE @currentRow = b.rownumber
)

SET @currentRow = @currentRow + 1
END 
Dan Herbert
  • 99,428
  • 48
  • 189
  • 219