2

My main problem is that I would like to check if someone with the same SSN has multiple accounts with us. Currently all personally identifiable info is encrypted and decryption takes a non-trivial amount of time.

My initial idea was to add a ssn column to the user column in the database. Then I could simply do a query where I get all users with the ssn or user A.

I don't want to store the ssn in plaintext in the database. I was thinking of just salting and hashing it somehow.

My main question is, is this secure (or how secure is it)? What is there a simple way to salt and hash or encrypt and ssn using python?

Edit: The SSN's do not need to be displayed.

This is using a MySQL database.

Shatnerz
  • 2,353
  • 3
  • 27
  • 43
  • http://stackoverflow.com/questions/9594125/salt-and-hash-a-password-in-python maybe you find inspiration in this question. – cdonner Jan 11 '17 at 19:39
  • Your database engine may enable you to enforce uniqueness in any database column. Too bad you didn't specify it. – Dan Bracuk Jan 11 '17 at 19:50
  • @DanBracuk I'm using MySQL. Are there DBs that enforce uniqueness in every column? – Shatnerz Jan 11 '17 at 19:53

3 Answers3

4

Do not encrypt SSNs, when the attacker gets the DB he will also get the encryption key.

Just using a hash function is not sufficient and just adding a salt does little to improve the security.

Basically handle the SSNs inthe same mannor as passwords.

Instead iIterate over an HMAC with a random salt for about a 100ms duration and save the salt with the hash. Use functions such as PBKDF2 (aka Rfc2898DeriveBytes), password_hash/password_verify, Bcrypt and similar functions. The point is to make the attacker spend a lot of time finding passwords by brute force. Protecting your users is important, please use secure password methods.

zaph
  • 111,848
  • 21
  • 189
  • 228
1

As per @zaph 's advice. I decided to use PBKDF2. I can then create a BIT column and index that.

My simple hashing looks like

import os
import hashlib


def hash_function(input_str):
    """Run pbkdf2_hmac with a 20byte salt, and 120,000 round on the input."""
    salt = os.urandom(20)
    return hashlib.pbkdf2_hmac('sha256', input_str, salt, 120000)
Shatnerz
  • 2,353
  • 3
  • 27
  • 43
-1

Your question doesn't make it clear if you need to display those SSNs. I'm going to assume you do not. Store the SSN in a SHA2 hash. You can then do a SQL query to search against those hashed values. Store only the last 4 digits encrypted for display.

Nigel Feasey
  • 79
  • 1
  • 2
  • How will the encryption key kept secure? – zaph Jan 11 '17 at 19:51
  • 1
    Using SHA2 on a laptop (an attacker's machine will be orders of time faster) one can perform a hash in < 1us. All 10^9 SSNs can be tried in < 20 minutes, much less for an attacker. There is no security there. – zaph Jan 11 '17 at 20:01
  • Just using a hash function is not sufficient and just adding a salt does little to improve the security. Instead iIterate over an HMAC with a random salt for about a 100ms duration and save the salt with the hash. Use functions such as `PBKDF2` (aka `Rfc2898DeriveBytes`), `password_hash`/`password_verify`, `Bcrypt` and similar functions. The point is to make the attacker spend a lot of time finding passwords by brute force. Protecting your users is important, please use secure password methods. – zaph Jan 11 '17 at 20:01
  • @zaph I just came to that conclusion around the time you posted your comments. I can do that fairly easily with python's hashlib. If you post a solution, I will accept it – Shatnerz Jan 11 '17 at 20:14
  • I think you're confusing SHA2 with SHA1, which can be easily broken. Using a random salt will mean you can't compare two values. Using SSNs as passwords is a bad idea, and that wasn't represented in this question. – Nigel Feasey Jan 11 '17 at 20:45
  • @NigelFeasey No, it is not about "breaking" SHA1 it is about an exhaustive brute force attach on the 10^9 SSNs. The same applies to any SHA2 or SHA3 hashing algorithm. – zaph Jan 11 '17 at 21:00
  • @NigelFeasey -- you can store the salt in clear -- the point of the salt is stop people from using pre-calculated tables. – Hogan Jan 11 '17 at 21:00
  • @Hogan Pre-calculated (rainbow) tables are not needed, SHA2 and SHA3 functions are fast enough to do a brute force attack given the limited range of SSNs. What is needed is o slowdown the calculation, that is the point of the iteration count. – zaph Jan 11 '17 at 21:06
  • @zaph -- the real issue is there are so few SSNs – Hogan Jan 11 '17 at 21:07
  • @Hogan See my commentsa, I have been saying that. Raising the time to calulate from sub usec to 100 msec makes a huge difference. – zaph Jan 11 '17 at 22:54