0

In my application, I fetch all tables in Database. User will select table name and colum names to be masked.

Now i want to update sql table-columns with random generate string , which must be unique for each row without using primary key or unique key.

For example, In my Employeedb i have a table Employee. Out of columns in Employee table, i want to mask data in name and city columns. If table conatins 1000 rows, i want change name and city columns with 1000 unique values each. That means i want to update row by row.

Name Address City Raghav flatno34 mumbai Ranveer flatno23 chennai

This is orignal data

Name Adress City Sbgha flatno34 mmjgujj Lkhhvh flatno23 huughh

This is expected out

The table have primarykey sometimes.. There may be chances of not having primary key.

I have one more qn, I have this expected output in a datatable. Since i cannot predefine the table name and number of fields how will i write an update qry.

Dhruv
  • 1
  • 1
  • Do some tables not have a primary key, or don't you know that they have? If neither, how will you update preserve foreign key constraints? What if you change the CountryCode column from "IN" to "XY"? You could update records not having a primary key by including all columns in the WHERE clause, but that's not guaranteed unique. Why do you want to code this yourself anyway? There are tools available for data scrubbing/anonymization. – CodeCaster Nov 24 '21 at 11:42
  • _i want change name and city columns with 1000 unique values each_ Are you certain about this? Shouldn't your goal be to obfuscate so that your changed values are still statistically distributed in the same manner as the original? And since you only discuss strings, how will you handle other datatypes like date and integer? Did you consider columns that participate in constraints? Be careful what you assume. – SMor Nov 28 '21 at 00:48

1 Answers1

0

I think you will find my blog post entitled How to pre-populate a random strings pool very helpful for this requirement.
(Inspired by this SO answer from Martin Smith, to give credit where credit is due)
It describes an inline table valued user defined function that generates a table of random values, which you can use to update your data.

However, it does not guarantee uniqueness of these values. For that, you must use DISTINCT when selecting from it.
One problem you might encounter because of that is having a result with less values than you generated, but for 1,000 records per table as you wrote in the question it's probably not going to be a problem, since the function can generate up to 1,000,000 records each time you call it.

For the sake of completeness, I'll post the code here as well, but you should probably read the post at my blog. Also, there's another version of this function in another blog post entitled A more controllable random string generator function for SQL Server - which gives you better control over the content of the random strings - i.e a string containing only numbers, or only lower digits.

The first thing you need to do is create a view that will generate a new guid for you, because this can't be done inside a user-defined function:

CREATE VIEW GuidGenerator
AS
    SELECT Newid() As NewGuid

Then, the function code: (Note: this is the simpler version)

CREATE FUNCTION dbo.RandomStringGenerator
(
    @Length int,
    @Count int -- Note: up to 1,000,000 rows
)
RETURNS TABLE
AS
RETURN
 
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)),   -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000
 
 
SELECT TOP(@Count) (
    SELECT  TOP (@Length) CHAR(
            -- create a random number from a guid using the GuidGenerator view, mod 3.
            CASE  Abs(Checksum(NewGuid)) % 3
                WHEN 0 THEN 65 + Abs(Checksum(NewGuid)) % 26 -- Random upper case letter
                WHEN 1 THEN 97 + Abs(Checksum(NewGuid)) % 26 -- Random lower case letter
                ELSE 48 + Abs(Checksum(NewGuid)) % 10 -- Random digit
            END
            )
    FROM Tally As t0  
    CROSS JOIN GuidGenerator 
    WHERE  t0.n != -t1.n -- Needed for the subquery to get re-evaluated for each row
    FOR XML PATH('')
    ) As RandomString
FROM Tally As t1

Then, you can use it like this to get a distinct random string:

SELECT DISTINCT RandomString 
FROM dbo.RandomStringGenerator(50, 5000); 
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121