I'm working on a "data obfuscator" script. I want to update every record in a table with fake data. I have a table with locations and trying to select a random record from that table to update the records in my person table. So SQL similiar to this
UPDATE Person
SET City = (SELECT TOP 1 City
FROM z.CityStateZip c2
ORDER BY Newid())
The problem with this is that it only ever selects 1 city, instead of selecting a random City for each person. I've also tried
(SELECT TOP 1 City FROM z.CityStateZip c2 ORDER BY NEWID()), PersonId, from Person
But it still only selects 1 city (I thought sub queries ran once for every record) instead of what I desire -- a random city for each record.
I've also tried doing a join with the same results, only 1 city ever selected....
SELECT t.City,
PersonId
FROM Person
INNER JOIN (SELECT TOP 1 City
FROM z.CityStateZip c2
ORDER BY Newid()) t
ON 1 = 1
I tried to shove this statement into a function but SQL Server doesn't allow me to use NEWID()
inside a function.
Answer
I modified Giorgi's answer and the answer to the linked question and came up with this, it is very fast! I changed how I was randomly selecting a city. The order by NewId() was an issue. So Person has 5k records and CityStateZip has ~30K, I took it down from 40 seconds to 4...(now even faster without the count subquery)
DECLARE @count bigint
SELECT @count = count(*) from z.CityStateZip
UPDATE p
SET p.City= b.City
FROM Person p
CROSS APPLY (SELECT TOP 1 City -- if Id is unique, top 1 shouldn't be necessary
FROM z.CityStateZip
WHERE p.SomeKey = p.SomeKey and -- ... the magic! ↓↓↓
Id = (Select ABS(Checksum(NewID()) % @count))) b