0

I store a list of names in the database. The list does not have any primary key for each record. These names are then used by another program and so require a primary key to be used as a foreign key.

The list is owned by a third party which is beyond our influence so we can't force them to provide a primary key.

I was using an incremental number to have PK but it has recently been noticed that the new records can be inserted anywhere in the list and so the incremental approach would fail. Hash codes can not be used as they are not guaranteed to be stable.

Is there any way i can generate a stable Primary Key for these records which can then be traced back with the name OR any other approach to refer back to a name in the list?

For example

On day 1 the list is like below

  1. Oliver
  2. Jack
  3. Harry
  4. Jacob
  5. Charlie

Assume that 963 is the key (as hash code) for Oliver and this code has been saved by program XYZ in its storage to refer back to Oliver.

On other day a new name Oscar has been added in the middle

  1. Oliver
  2. Jack
  3. Oscar
  4. Harry
  5. Jacob
  6. Charlie

Now the key for Oliver is 852 because Hash code is not guaranteed to be stable and sequencing has been disturbed too hence the program XYZ can not refer back to Oliver using the key 963.

bjan
  • 2,000
  • 7
  • 32
  • 64
  • Is the list of names being refreshed daily, like sanctions? If it's not stable then you either shouldn't use a key relationship, or alternatively, you will have to perform a hash and use that as the key. You'll have to refresh the entire database when you get new data though since a person's name changing will not remove or update the old record since the hash will be different. – NibblyPig Mar 14 '19 at 10:03
  • @NibblyPig Yes, the list is HMT list. – bjan Mar 14 '19 at 10:09
  • Hah, I knew it. Yes, I had this exact problem in the past, the only way we did it was by creating a checksum of every column, and every time a new file was loaded, we would delete all of the records and re-create them. We couldn't use a foreign key relationship because of this, so we had to store a copy of all of the data whenever we used it. – NibblyPig Mar 14 '19 at 10:17
  • I don't get it. How does the *hash* code for "Oliver" change, because a completely unrelated entry has been added? It's not a hash code if it is not generated *only* from the original value. Use a *real* hash code and you should be fine. – Sefe Mar 14 '19 at 10:28
  • @NibblyPig You would know that a single record in HMT would contain 500 to 1000 characters, won't saving each record for each name screening generate lot of records!!! – bjan Mar 14 '19 at 10:29
  • @Sefe Hash codes are not guaranteed to be stable, they change based on platform, framework version. https://andrewlock.net/why-is-string-gethashcode-different-each-time-i-run-my-program-in-net-core/ – bjan Mar 14 '19 at 10:33
  • We just stored name and date of birth for each record that matched. You can always look up the record with a name search, 99% of the time it will be there with a different ID. Otherwise the best you can do is say that it matched a record at that time. – NibblyPig Mar 14 '19 at 10:42
  • @NibblyPig did you try the solution to this question https://stackoverflow.com/questions/36845430/persistent-hashcode-for-strings – bjan Mar 14 '19 at 10:48
  • Yes, we created hashes by adding all the strings together (concatenation) and then doing an MD5 checksum and storing it as the ID. We did it in SQL as the rows were added. – NibblyPig Mar 14 '19 at 10:49
  • @bjan: Well you have to distingiush: Yes it may be that `string.GetHashCode()` is implemented differently on different platforms. But if you use a hash function like SHAxx or MDxx it is guaranteed to produce the same output on every platform, because it's standardized. – derpirscher Mar 14 '19 at 10:57
  • @derpirscher: It's not only standardized, the whole point of a hash function is to allow reliable comparison of values when the hash function is known. – Sefe Mar 14 '19 at 15:24

0 Answers0