2

I am trying to update multiple rows with random 9 digit number using the following code.

UPDATE SGT_EMPLOYER 
SET SSN = (CONVERT(NUMERIC(10,0),RAND()  * 899999999) + 100000000)
WHERE EMPLOYER_ACCOUNT_ID = 123456789;

Expected result: the query should update 300 rows with 300 random 9 digit numbers.

Actual: query is updating 300 rows with same number as the ran() function is executing only once.

Please help. Thank You.

abcreddy
  • 102
  • 3
  • 19
  • 1
    Possible duplicate of [Why is RAND() not producing random numbers?](https://stackoverflow.com/questions/8378587/why-is-rand-not-producing-random-numbers) – mjw Jun 13 '17 at 21:19
  • 2
    Solving the current issue is barely the tip of the problems you have here. You should NEVER EVER store SSN in plain text. I realize you are scrambling it here but this is a clear indication that in the application the values are in plain text. This is not only a horrible idea it is actually a violation of laws in some states. This type of information needs to be encrypted at all times. – Sean Lange Jun 13 '17 at 21:23
  • 1
    Your update statement is working as SQL server intends - it is calculating the `rand()` once, and then applying it to the table according to the `where` clause. What you need to do is iterate through the table doing this update query to each row. Cursors or while loops should get you where you need to go. – RToyo Jun 13 '17 at 21:34
  • 1
    @SeanLange This is off topic, but I don't think the SSN here stands for Social Security Number. Even if it does, there's zero risk here, because it's literally just a random number. The original ssn is not stored - neither encrypted; hashed; scrambled; or otherwise. – RToyo Jun 13 '17 at 21:35
  • 1
    @RobbieToyota a table named SGT_EMPLOYER with column named SSN and there are 9 characters....that isn't a red flag that is in fact a social security number? I would disagree about there being zero risk. Yes once this is scrambled, probably for a test database, that system is fine. But the original system is obviously storing this incorrectly. – Sean Lange Jun 14 '17 at 13:18
  • @SeanLange Thanks for the updates. We have all the column level encryption in place for SSN related information. We just want to scramble the encrypted SSN data for test environment. – abcreddy Jun 14 '17 at 15:53
  • @RobbieToyota Thanks for the updates. We have all the column level encryption in place for SSN related information. We just want to scramble the encrypted SSN data for test environment. – abcreddy Jun 14 '17 at 15:53

3 Answers3

3

As you already figured out yourself, RAND is a run-time constant function in SQL Server. It means that it is called once per statement and the generated value is used for each affected row.

There are other functions that are called for each row. Often people use NEWID usually together with CHECKSUM as a substitute for a random number, but I would not recommend it because the distribution of such random numbers is likely to be poor.

There is a good function specifically designed to generate random numbers: CRYPT_GEN_RANDOM. It is available since at least SQL Server 2008.

It generates a given number of random bytes.

In your case it would be convenient to have a random number as a float value in the range of [0;1], same as the value returned by RAND.

So, CRYPT_GEN_RANDOM(4) generates 4 random bytes as varbinary. Convert them to int, divide by the maximum value of 32-bit integer (4294967295) and add 0.5 to shift the range from [-0.5;+0.5] to [0;1]:

(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5)

Your query becomes:

UPDATE SGT_EMPLOYER 
SET SSN = 
    CONVERT(NUMERIC(10,0),
    (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 899999999.0 + 100000000.0)
WHERE EMPLOYER_ACCOUNT_ID = 123456789;
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • Thanks for the answer.... it worked for me but when I update half million records, i got 103 duplicate values. – abcreddy Jun 15 '17 at 03:33
  • 1
    @abcreddy, random numbers are, well, random. There is always a chance that you'll get duplicates. Generating a set of `N` **unique** random numbers is a different story. It is a bit more complicated than the query in this answer. If you need this kind of a query, you'd better ask a new question and clearly explain your constraints and requirements. As a one-off manual solution you can always add a second query that would check for duplicates and re-generate the numbers for those rows. – Vladimir Baranov Jun 15 '17 at 11:46
  • @abcreddy, have a look at these questions as well: [Generate different random time in the given interval](https://stackoverflow.com/q/23314054/4116017), [generate seemingly random unique numeric ID in SQL Server](https://stackoverflow.com/q/26967215/4116017) – Vladimir Baranov Jun 15 '17 at 11:54
1

Yes, the rand() line will only be executed once, before the rows are being updated, not every time a row is updated.

You can use a Stored Procedure to update every row with (CONVERT(NUMERIC(10,0),RAND() * 899999999) + 100000000).

Steven Lemmens
  • 1,441
  • 3
  • 17
  • 30
1

Sean Lange is 100% correct. However, if you want to quickly mask your SSN, perhaps the following using HashBytes() may help.

Example

Declare @Table table (SSN varchar(25))
Insert into @Table values
('070-99-12345'),
('123-45-67890')

Select SSN
      ,AsInt = abs(cast(HashBytes('MD5', SSN) as int))
 From  @Table

Returns

SSN             AsInt
070-99-12345    508860145
123-45-67890    843256257
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
  • Thanks for the answer John. Your query is working but it is generating 10 digit numbers as well in some cases so I have to use either Left() or rigth() to take 9 digit out of it. There are also some duplicate(350+) values when I do it for more than half a million records. – abcreddy Jun 15 '17 at 03:38