4

Our DBA says that because the social security number (SSN) is unique, that we should use it as the primary key in a table.

While I do not agree with the DBA (and have already mentioned some of the good parts found in this answer), are there any situations where he could possibly be correct?

JonathanDavidArndt
  • 2,518
  • 13
  • 37
  • 49

3 Answers3

9

SSN is not a unique identifier for people. Whether it should be the PK in some table depends on what the rows in the table mean. (See also sqlvogel's answer.)

6.1 percent of Americans have at least two SSNs associated with their name. More than 100,000 Americans have five or more SSNs associated with their name. [...] More than 15 percent of SSNs are associated with two or more people. More than 140,000 SSNs are associated with five or more people. Significantly, more than 27,000 SSNs are associated with 10 or more people.
--idanalytics.com

See also Wikipedia Social Security number.

philipxy
  • 14,867
  • 6
  • 39
  • 83
7

Never!

In the USA, both Federally and in the several States, there are strict laws concerning the handling of social security numbers and the uses to which they may be put. As the issue of identity theft comes more-and-more to the forefront, this concern will merely become more-regulated. Therefore, I would strongly recommend that you never use this as a database key. SS# should be a (very confidential) column in the record.

Furthermore, it is a customer-supplied value. Sometimes, the value supplied is incorrect, or missing, or unintentionally duplicated. You should never use customer-supplied values of any sort as a database primary key. You should not use identifiers that are intended "for use by humans" as a database primary key. (Humans tolerate ambiguity ... computers do not!) Instead, let all such identifiers be column-values stored somewhere in the database, possibly (or, not ...) indexed by a UNIQUE key to prevent duplicates.

I recommend that all primary keys should be completely abstract, containing no meaning at all. For instance, a UUID could be used. Within the database, auto-incrementing integers can be used.

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41
  • 1
    As you mentioned, sometimes the value supplied is incorrect. This question first came to mind because, earlier in my life, a bank mis-keyed my SSN... and then they said they couldn't change it without closing and reopening the account. – JonathanDavidArndt Jul 31 '17 at 19:02
  • 3
    This answer is just not logical. A unique, non-nullable key is the same thing as a primary key. What matters is how the key or keys are being used. The question says nothing about how the key will be used and so there is no sound basis for challenging the DBA's advice. It's entirely arbitrary to say "never" here. – nvogel Jul 31 '17 at 20:16
  • The confidentiality argument is the most important factor in this answer. – Joel Brown Aug 01 '17 at 02:39
  • 2
    Confidentiality is of course important. The confidentiality issues are the same whether SSN is a primary key or a non-primary key. The responsibility to protect the SSN from misuse is the same whether the SSN is primary key or not. – nvogel Aug 01 '17 at 04:28
  • 1
    This is exactly the answer that I would have written. You've hit every major point, except that the SSN is not guaranteed to be unique: not only is is a *customer-supplied* value, it is a value generated by an *external-entity*, a gov dept. (Although, the original question is looking for situations where using the SSN as a PK could be considered *correct*.) – JonathanDavidArndt Aug 07 '17 at 12:28
  • 1
    I had to deal with an analogous problem when doing a project for a health insurance company (CIGNA Arizona). Their "provider IDs" had embedded information and were human-issued, and there were many duplicates, and providers had to know which of their many numbers to use. Perfectly dreadful. I solved the problem by creating random unique primary-keys to unambiguously identify "providers," with a 1:M relationship to their "provider IDs." This enabled them to get the reports they needed, and to clean-up the mapping table over time. When an all-new scheme was rolled out, my software still ran. – Mike Robinson Jan 18 '18 at 16:14
  • 1
    ... and, by the way, those "random PKs" were never revealed, just in case some human might have tried to use them instead! UUID's hadn't been invented yet. The primary keys were just what a good primary key should be – arbitrary, unique, meaningless, and hidden from the user's eyes. – Mike Robinson Jan 18 '18 at 16:17
  • UUIDs had not been invented? Would you care to cite the time of your project then? Since UUIDs are 28 years old and EHR/DICOM is only Federally mandated within the last decade and has a few built-in options for deduplicating PatientID (0010,0020). It's also quite hard, nearly impossible, to get random numbers to be safely unique, especially in a HIPAA world. In fact, some hackers [dangerously misunderstand the issue](http://perlmonks.org/?node=1090653). – Ashley Feb 10 '18 at 04:22
1

Keys can only be determined from business rules. What matters is whether it makes sense in the context of your business requirements to enforce uniqueness of SSN or not. That is not a matter your DBA alone can decide on. It's something on which to consult the HR department or whoever else uses this data.

Assuming that your table will have at least one key (perhaps more than one) then I suggest that the DBA is in the best position to advise on the policy for primary keys. If the choice of a primary key is of any importance at all then the DBA ought to say so.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • 1
    The purpose of this question was to seek out alternative views, and you have certainly provided one. I am looking for a technical situation in which this kind of thing might be correct. Can you amplify some of your points? What sort of business decisions by HR might **require** this as a `PK` instead of `UNIQUE`? – JonathanDavidArndt Aug 01 '17 at 12:24
  • 1
    I don't think you understood my answer. The requirement to enforce *uniqueness* (i.e. a key) depends on business rules and the intended meaning of your table in its business context. PK is just a key - potentially one of several keys in a table. The choice of which keys you call primary, if any, is typically determined by any of three things: A)arbitrary preference, B) technical limitations of the software you are using, C) local convention. Your DBA is best placed to advise on B and C since we know nothing of the software and standards in use in your environment. – nvogel Aug 01 '17 at 14:18