1

I'm working on a little project to depersonalise data in our development environment (like names, telephone numbers, prices, etc.). I have some ideas to handle this but i'm not sure if/how it could work. Maybe you can give me some tips.

Example:

 table1 (firstname, lastname, emplyoeeid, office) 
 table2 (employeeid, phonenumber, mobilephonenumber, device, addons)

Procedure (how it should look like):

    @databasename varchar (50)

    @tablename varchar(50)

    @attributes ???

    USE DATABASE @databasename

    UPDATE TABLE @tablename
    SET @attributes = HASHBYTES('MD5',@attributes);

Every time I run the procedure I can choose 1 tablename and a list of attributes which I want to depersonalise. In the previous example it would be

  1. run1 : @tablename (table1) @attributes ('firstname', 'lastname')
  2. run2 : @tablename (table2) @attributes ('phonenumber', 'mobilphonenumber')

The procedure should encrypt just the attributes listed in the variable. Is it possible to implement such a procedure? How can I handle the attributes-list in a variable? Are there any smarter ways to implement this logic?

Thanks for your help.

Nc_Sr
  • 65
  • 3
  • [Here is a question you might want to take a look at](http://dba.stackexchange.com/questions/78693/scrubbing-sensitive-data). – Radu Gheorghiu Aug 12 '15 at 08:35
  • [This one too](http://dba.stackexchange.com/questions/11719/scrubbing-names-via-sql-query-batch) – Radu Gheorghiu Aug 12 '15 at 08:36
  • [This can also help (so many resources online)](http://dba.stackexchange.com/questions/23786/data-obfuscation-in-sql-server) – Radu Gheorghiu Aug 12 '15 at 08:37
  • The word you are missing is "anonymise": making things anonymous. A quick search shows lots of potentially useful information. – Richard Aug 12 '15 at 08:38
  • possible duplicate of [Anonymizing customer data for development or testing](http://stackoverflow.com/questions/260307/anonymizing-customer-data-for-development-or-testing) – Richard Aug 12 '15 at 08:38
  • Thanks for your fast help. I need some time to read all the informations and rethink my solution. If you have some more infos please let me know. I'll update with my final version and mark the correct answer. – Nc_Sr Aug 12 '15 at 08:53
  • what is wrong with update table1 set firstname = HASHBYTES('MD5', firstname); – paparazzo Aug 12 '15 at 10:56
  • @Frisbee it all depends on your level of security and anonymization requierement. Just make sure this is not too big for some columns or truncate it and that it does not break Unique constraints if you truncate. – Julien Vavasseur Aug 12 '15 at 13:04
  • @JulienVavasseur I don't have a question – paparazzo Aug 12 '15 at 13:06

1 Answers1

0

You can use a Table type:

Create Type [dbo].[Columns] AS TABLE(
    [name] [sysname] NOT NULL
)
GO

Create Proc Anonymise(
    @table sysname
    , @Columns [dbo].[Columns] READONLY
    ) as 
begin
    set nocount on

    --Checks:
    --If @table not int sys.tables => error
    --If @Columns empty => error
    --@Columns not in sys.columns for @tables => error
    --Column type not char/varchar or type xxx => error

    Declare @list nvarchar(max), @sql nvarchar(max)
    Select @list = coalesce(@list+N', ', N'     ')+name+N'=AnonymiseFunction('+name+N')' From @Columns

    Set @sql = N'Update ['+@table+'] Set '+@list
    print @sql
    Exec sp_executesql @sql
end
GO

And execute:

Declare @cols [dbo].[Columns]
Insert Into @cols Values('x',) ('y')
Exec Anonymise 'table', @cols

You can also replace , @Columns [dbo].[Columns] by a nvarchar(max) comma separated. At the begining of the proc, you must declare a table variable:

declare @columns Table(name sysname)

and use a CTE to split the string into @columns table.

Split string: http://blogs.msdn.com/b/amitjet/archive/2009/12/11/sql-server-comma-separated-string-to-table.aspx

Julien Vavasseur
  • 3,854
  • 1
  • 20
  • 29