1

I am in the process of creating demo data from data that contains Patient History Information (PHI). There are a few columns where I just want to generate a random value that stays consistent throughout the data. For example, there is a field like SSN that I want to create a random 9 number digit for each unique SSN, but keeping this number the same where the claims are of the same individual. So 1 SSN might have 5 claims and each claim will have the same randomly created SSN.

sample

ssn           date1       procedure
443234432     1/1/2019    needle poke
443234432     1/2/2019    needle poke
676343522     1/3/2019    total knee procedure
443234432     1/4/2019    total hip procedure
676343522     1/5/2019    needle poke

final

ssn           date1       procedure
856345544     1/1/2019    needle poke
856345544     1/2/2019    needle poke
979583338     1/3/2019    total knee procedure
856345544     1/4/2019    total hip procedure
979583338     1/5/2019    needle poke

As you can see, the snn changed, but stays the same for all instances where the ssn was the same.

For numbers like this, I can convert to a numeric and multiply/divide/add/subtract to create a random number that maintains integrity, but how can I handle this for instances where there are non-numerics?

I attempted to use HASHBYTES but am getting lots of strange characters. Is there another method that could generate a random value and maintain consistency throughout the dataset?

RIGHT(convert(NVARCHAR(10), HASHBYTES('MD5', SSN)),10) as SSN

RESULTS:
댛량뇟㻣砖聋蠤

I've read a number of articles about this, but I couldn't find much about maintaining consistency across multiple claims. I appreciate any feedback.

CandleWax
  • 2,159
  • 2
  • 28
  • 46
  • This a great question. I am running in to a similar issue! – Pat Doyle Jan 10 '19 at 15:26
  • The latest SSMS (preview) actually may have the functionality you're after out of the box: [Static Data Masking (SSMS 18.0 Preview)](http://www.sqlservercentral.com/blogs/sqlespresso/2018/12/19/static-data-masking-ssms-180-preview/?utm_source=SSC&utm_medium=pubemail) – Thom A Jan 10 '19 at 15:32
  • We have been using `HASHBYTES('SHA1', CONVERT(varchar(max), sourceID))` for a number of years with consistency producing results like `0x166A0DD5 ......` which we use directly – Peter Smith Jan 10 '19 at 15:38

3 Answers3

1

I if i understand your query it's to convert varbinary to varchar look at this article : varbinary to string on SQL Server

And you can try this code :

SELECT RIGHT(CONVERT(VARCHAR(1000), HASHBYTES('MD5', 'SOMEVALUE'), 1),10);
Sanpas
  • 1,170
  • 10
  • 29
1

I think you want printable characters. In that case, you can use the CONVERT function to translate the bytes result of a HASHBYTES to a hex representation as string. Just make sure to pass the value 2 as the third parameter.

DECLARE @SomeValue VARCHAR(100) = CONVERT(VARCHAR(100), NEWID())

SELECT
    @SomeValue AS Original,
    CONVERT(
        VARCHAR(20), 
        HASHBYTES('MD5', @SomeValue), 
        2) AS Scrambled

A few results:

Original                                Scrambled
BC9EC2E0-2009-45FA-AA95-64585B815BD9    A33AEBC011E9188EB97E
6FF7E0FE-E054-49D7-A451-80111BF5B200    94F93C6A5CBD0E56C70B
C8F8CD77-96B7-4B74-84B7-4EB3412C6CE7    2994341068CE8C4E1EF9

Put the length you want as the varchar target in the first parameter.

Please be advised that hash functions might generate same result on different inputs, and it will specially if you are truncating the result to the first N characters.

EzLo
  • 13,780
  • 10
  • 33
  • 38
0

I don't understand your issue:

SELECT HASHBYTES('MD5', N'Wahoooo') 

This works just fine and will be the same value all the time. The garbled chars issue is probably you trying to convert a varbinary value into nvarchar.

SELECT CAST(HASHBYTES('MD5', N'Wahoooo') AS nvarchar(10))
dfundako
  • 8,022
  • 3
  • 18
  • 34
  • The HASHBYTES returns a string, but even when casting with your code below I get `∽樾ꍉ�ῗﷄ` – CandleWax Jan 10 '19 at 15:34
  • 3
    @MartinBobak HashBytes absolutely does not return a string. Why are you casting the value returned by hashbytes anyway? – dfundako Jan 10 '19 at 15:35