1

I have a SQL table that stores unique nvarchar data set to 60 characters max.

I now need to output each value to a file on a daily basis. This file is then fed into a 3rd party system.

However, this 3rd party system requires the value to be limited to 10 characters. The values does not have to be what is in the table. They just need to be unique and 10 characters max. They must also be consistent in that the same unique id is used each day for the table value.

I cannot truncate the string as it could then lose its uniqueness.

Looking at my options, I could:

  1. Use GetHashCode()
  2. Convert to Hexadecimal

With GetHashCode, this looks a simple straightforward option and I get the same value each time it it run. However, Microsoft documentation recommends against using it for my purpose...

https://learn.microsoft.com/en-us/dotnet/api/system.string.gethashcode?redirectedfrom=MSDN&view=netframework-4.8#System_String_GetHashCode

As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and they should never be persisted.

With Hexadecimal conversion, it may also lose uniqueness when trimmed to 10 characters.

I have also looked at this example but again I'm not sure how reliable it is with uniqueness: A fast hash function for string in C#

static UInt64 CalculateHash(string read)
{
    UInt64 hashedValue = 3074457345618258791ul;
    for(int i=0; i<read.Length; i++)
    {
        hashedValue += read[i];
        hashedValue *= 3074457345618258799ul;
    }
    return hashedValue;
}

Are there any other options available to me?

K09
  • 201
  • 2
  • 13
  • consider using Guid – Shubham May 08 '19 at 10:28
  • Just generate a sequence, starting at "AAAAAAAAAA" and progressing upwards through "AAAAAAAAAB" etc. Ideally store this in the database, but otherwise map the real string to the sequence in a Dictionary. – Steve Todd May 08 '19 at 10:34
  • The pigeon-hole principle applies here. There is no way you can take a 60-character unique key **in the general case** and shoehorn that into a 10-character string and keep them all unique. If you can post limitations to this 60-character string, like "They're all digits ONLY", then *perhaps* someone can come up with an algorithm. – Lasse V. Karlsen May 08 '19 at 10:51
  • GetHashCode is also not guaranteeing collisions, that's not what its purpose is. – Lasse V. Karlsen May 08 '19 at 10:51
  • 1
    Your best option, in my opinion, is to add a separate 10-character column to the database and assign new, unique keys, to this column as you export the rows. – Lasse V. Karlsen May 08 '19 at 10:52
  • As an example, if that 60-character string only allows 2 distinct characters to be used, 1s and 0s, you would have 2^60 different combinations, which is a number that is 19 digits long. You could store that uniquely into a 10-character string if you allowed at least 64 unique characters to be used.https://www.wolframalpha.com/input/?i=64%5E10+%3E%3D+2%5E60 – Lasse V. Karlsen May 08 '19 at 10:57
  • If that 60-character string allows all letters + all digits to be used, the number of unique combinations balloons to a number that is 94 digits long. – Lasse V. Karlsen May 08 '19 at 10:58
  • And my "best option" answer above is just *hoping* that you don't have so many unique rows that 10 is not enough. – Lasse V. Karlsen May 08 '19 at 10:58

1 Answers1

0

Add an unique Identity key to your table and let SQL Server manage the incrementation for you. This can be seeded with a large number if needed.

Fleshy
  • 182
  • 7