12

I have two tables. Table 1 has about 80 rows and Table 2 has about 10 million.

I would like to update all the rows in Table 2 with a random row from Table 1. I don't want the same row for all the rows. Is it possible to update Table 2 and have it randomly select a value for each row it is updating?

This is what I have tried, but it puts the same value in each row.

update member_info_test
set hostessid = (SELECT TOP 1 hostessId FROM hostess_test ORDER BY NEWID())

**Edited

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
chobo
  • 31,561
  • 38
  • 123
  • 191
  • This will point you in the right direction : http://stackoverflow.com/questions/19412/how-to-request-a-random-row-in-sql – Landjea Oct 25 '12 at 20:41
  • 2
    You don't want the same record for one? Difficult when the first table has 80 and the table you want to update has 10M records. – Tim Schmelter Oct 25 '12 at 20:42
  • Well not all the same records for every record. I just want it to use the 80 records from that one table – chobo Oct 25 '12 at 20:48
  • Do you need to do an update? Can you just remove all records and do an insert? – Abe Miessler Oct 25 '12 at 20:48
  • Your query looks okay. The only thing I can think of is that the optimizer is executing the subquery only once. It should not be doing so, because `newid()` is volatile. – Gordon Linoff Oct 25 '12 at 20:54

4 Answers4

16

Ok, I think that this is one of the weirdest query that I've wrote, and I think that this is gonna be terrible slow. But give it a shot:

UPDATE A
SET A.hostessid = B.hostessId
FROM member_info_test A
CROSS APPLY (SELECT TOP 1 hostessId
             FROM hostess_test 
             WHERE A.somecolumn = A.somecolumn
             ORDER BY NEWID()) B
Lamak
  • 69,480
  • 12
  • 108
  • 116
  • 1
    The values are all the same :( – chobo Oct 25 '12 at 21:00
  • 1
    @chobo - Really?, I tested this with sample data and it worked fine. But to get the different values, the `WHERE A.somecolumn = A.somecolumn` was mandatory – Lamak Oct 25 '12 at 21:06
  • I don't know why it works for you, but I get the same values in each row – chobo Oct 25 '12 at 21:59
  • Actually this query sort of works, but what confuses me is the Where A.somecolumn = A.somecolumn. I seem to get different results depending on the columns I use – chobo Oct 26 '12 at 16:19
  • 1
    It seems the more different values A.someColumn has the more random the results – chobo Oct 26 '12 at 16:23
  • @chobo that may be the reason. I tried it with the key for that table and it worked as intended. – Lamak Oct 26 '12 at 16:29
  • You would need to correlate on a unique value from `A` to be sure that if a spool is added it is always rebound not rewound. [Duplicate of this question](http://stackoverflow.com/a/12922951/73226) – Martin Smith Oct 26 '12 at 17:40
  • @MartinSmith I did that on my test query, but not knowing that it was a necessity. And your answer explains why it works that way, thanks – Lamak Oct 26 '12 at 18:19
1

I think this will work (at least, the with portion does):

with toupdate as (
      select (select top . . . hostessId from hostess_test where mit.hostessId = mit.hostessId order by newid()) as newval,
             mit.*
      from member_info_test mit
     )
update toupdate
    set hostessid = newval;

The key to this (and to Lamak's) is the outer correlation in the subquery. This is convincing the optimizer to actually run the query for each row. I don't know why this would work and the other version would not.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • If you put in a 1 where the `. . .` is, then it should work. Any idea why I cannot insert this code? – Gordon Linoff Oct 25 '12 at 21:15
  • This worked fine for me in SQL2012; I had to omit the `mit` alias in the `update` portion, however. All-in-all, a good, logical, solution for a one-off problem. – Paul Suart Jun 14 '17 at 09:30
0

Here is what i ended up using:

EnvelopeInformation would be your Table 2

PaymentAccountDropDown would be your Table 1 (in my case i had 3 items) - change 3 to 80 for your usecase.

;WITH cteTable1 AS (
    SELECT
        ROW_NUMBER() OVER (ORDER BY NEWID()) AS n,
        PaymentAccountDropDown_Id
    FROM EnvelopeInformation
    ),
cteTable2 AS (
    SELECT 
        ROW_NUMBER() OVER (ORDER BY NEWID()) AS n,
        t21.Id
    FROM PaymentAccountDropDown t21
    )
UPDATE cteTable1
   SET PaymentAccountDropDown_Id = (
       SELECT Id 
       FROM cteTable2
       WHERE  (cteTable1.n % 3) + 1 = cteTable2.n
)

reference: http://social.technet.microsoft.com/Forums/sqlserver/pt-BR/f58c3bf8-e6b7-4cf5-9466-7027164afdc0/updating-multiple-rows-with-random-values-from-another-table

Leblanc Meneses
  • 3,001
  • 1
  • 23
  • 26
0

Update Table with Random fields

UPDATE p
    SET p.City= b.City
    FROM Person p
    CROSS APPLY (SELECT TOP 1 City
                 FROM z.CityStateZip 
                 WHERE p.SomeKey = p.SomeKey and -- ... the magic! ↓↓↓
                 Id = (Select ABS(Checksum(NewID()) % (Select count(*) from z.CityStateZip)))) b
Community
  • 1
  • 1
CSharper
  • 5,420
  • 6
  • 28
  • 54