4

In some cases, organizations are not permitted to use or store useful keys, such as SSN numbers, phone numbers, etc.

However, these unique keys are very useful for matching data. So, theoretically, if a data provider were able to provide you with a hashed value of the SSN, and you were to store that hash and use it for matching, you would never have to use or store the SSN.

What would be an appropriate hash function for something like a SSN?

Alex
  • 34,899
  • 5
  • 77
  • 90
chris
  • 36,094
  • 53
  • 157
  • 237

3 Answers3

1

You need to treat the SSN exactly like a password. Hash them using a strong, slow hash algorithm such as bcrypt or PBKDF2, using a unique per-record prefix and suffix salt.

The downside of hashing SSNs is that they're predictable, and have very little entropy, making a plaintext bruteforce quite easy. If you can afford it, I'd suggest investing in hardware protection (i.e. a HSM) for this kind of thing. In fact, you should avoid identifying people by their SSN entirely.

Community
  • 1
  • 1
Polynomial
  • 27,674
  • 12
  • 80
  • 107
  • @HunterMcMillen The number of bits of padding doesn't have to be specific, as long as there are plenty of them before and after the data. – Polynomial Jun 06 '12 at 18:04
  • generally hash functions process data in blocks of n bits, if the incoming data is < n bits it gets padded in some predictable way due to the algorithm. – Hunter McMillen Jun 06 '12 at 18:06
  • @HunterMcMillen Sure, but for all practical attacks the salt is simply there to prevent collisions between equal plaintexts and stop rainbow tables from being effective. – Polynomial Jun 06 '12 at 18:08
  • The salt here seems irrelevant due to the already unique nature of an SSN. If you are going add salt values to pad the SSN you might as well just generate some other unique id instead. – Hunter McMillen Jun 06 '12 at 18:11
  • @Polynomial: The SSN would only be used to make updating my data easier, and is just one of the attributes checked. A salt would defeat the purpose, unless the same salt was used by my provider. I am looking for an alternative to storing SSN in the clear. – chris Jun 06 '12 at 18:14
  • 2
    @chris Then I think the best option is to never store the SSN, and give the user a different form of unique ID. – Polynomial Jun 06 '12 at 18:15
  • @Polynomial: The problem is that different data sources have different identifiers, and I need (and have a legal right to use) something common. I just don't want to store it. – chris Jun 06 '12 at 18:21
  • @chris So generate a unique ID for each user, then have a database table that maps credentials and their sources to users. – Polynomial Jun 06 '12 at 18:45
  • Some of those sources use SSN, which I don't want to store on my system. They do not have an alternative. – chris Jun 06 '12 at 19:12
0

So, theoretically, if a data provider were able to provide you with a hashed value of the SSN, and you were to store that hash and use it for matching, you would never have to use or store the SSN.

That is false; hashes by design are not unique and cannot be used to uniquely identify anything. If you must uniquely identify something, and are not allowed to use someone else's identifier, you must come up with your own identifier. That is why things like gas cards, movie rental cards, et al. come with their own unique membership identifiers.

Dour High Arch
  • 21,513
  • 29
  • 75
  • 90
  • If the provider hashes a number, and I hash the same number with the same algorithm, the hash value will be the same. I can then match my data keyed on my hash with their data, keyed on the same hash value. – chris Jun 06 '12 at 18:03
  • @Chris, if the provider hashes two different numbers they can come out to the same hash value. You would treat two different SSNs as the same one. – Dour High Arch Jun 06 '12 at 18:05
  • 1
    I believe that the point of a good hashing algorithm is to reduce or eliminate the possibility of collisions, even for small inputs. Take a look at http://stackoverflow.com/questions/4676828/when-generating-a-sha256-512-hash-is-there-a-minimum-safe-amount-of-data-to – chris Jun 06 '12 at 18:12
0

True, but anyway you can still use it to uniquely fingerprint something, that is the SSN number, relying on the second preimage resistance property of the cryptographic hash function. (as said above hashing them using a strong, slow hash algorithm, using a unique per-record prefix and suffix salt, because of the small size of the data)