0

I'm using SQL Server 2014 Developer Edition Service Pack 2 on Windows 7 Enterprise machine.

The question

Is there a set based way that I can create an integer field based on a string input? It must ensure that the Entity ID field is never duplicated for different inputs?

Hypothetical table structure

|ID|Entity ID|Entity Code|Field One|From Date|To Date |
|1 |1        |CodeOne    |ValueOne |20160731 |20160801|
|2 |1        |CodeOne    |ValueTwo |20160802 |NULL    |
|3 |2        |CodeTwo    |ValueSix |20160630 |NULL    |

Given the above table, I'm trying to find a way to create the Entity ID based on the Entity Code field (it is possible that we would use a combination of fields)

What I've tried so far

  1. Using a sequence object (don't like this because it is too easy for the sequence to be dropped and reset the count)
  2. Creating a table to track the Entities, creating a new Entity ID each time a new Entity is discovered (don't like this because it is not a set based operation)
  3. Creating a hashbyte on the Entity Code field and converting this to a BIGINT (I have no proof that this won't work but it doesn't feel like this is a robust solution)

Thanks in advance all.

GreenyMcDuff
  • 3,292
  • 7
  • 34
  • 66
  • Can you please provide some samples..for this statement .."Is there a set based way that I can create an integer field based on a string input?" – TheGameiswar Aug 04 '16 at 13:48
  • If you mean that you want to take a string and derive a unique and _reproducible_ integer with no other context, then no. A hash may fold several different strings down to the same value. You can play the game of creating a value, e.g. by hashing, and then incrementing it until you find a previously unused value, but that requires context, i.e. all of the previously assigned values, and cannot be reproduced with only the string as input. Special cases may apply, e.g. if you know the string length will never exceed four characters, the character set is limited, ... . – HABO Aug 04 '16 at 13:56
  • I feel like the only way to accomplish this is with a `cursor` that checks to see if each `EntityCode` about to be inserted already exists and if not, incrementing the max `EntityID` to create a new `EntityID` for that `EntityCode`. Which would obviously perform very slowly and not be set based at all. – iamdave Aug 04 '16 at 16:41

1 Answers1

0

Your concerns over HashBytes collisions is understandable, but I think yo can put your worries aside. see How many random elements before MD5 produces collisions?

I've used this technique when masking tens of thousands of customer account numbers. I've yet to witness a collision

Select cast(HashBytes('MD5', 'V5H 3K3') as int)

Returns -381163718

(Note: as illustrated above, you may see negative values. We didn't mind)

Community
  • 1
  • 1
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66