6

I've been told that you should not store the users password in a database, but how can I authenticate users if I cannot save their password? Is simply encrypting them enough to keep them safe?

There have been several stories in the news lately of high-profile sites that have been compromised, like LinkedIn, and I don't think such a high profile site would store plain-text passwords, so would assume they were encrypted.

Rachel
  • 130,264
  • 66
  • 304
  • 490
Frankie
  • 24,627
  • 10
  • 79
  • 121
  • @GiulioMuscarello, quoting from Jeff himself *it is not merely OK to ask and answer your own question, it is explicitly encouraged*! ;) More on that here: http://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/ Nonetheless thks for the comment! – Frankie Nov 20 '12 at 16:19
  • 1
    Sorry, didn't notice the self-reply. – Giulio Muscarello Nov 20 '12 at 16:38
  • StackOverflow is not a Wiki where you ask a (although seemingly popular) question and answer it yourself at the same time. – Gumbo Nov 20 '12 at 18:41
  • 4
    @Gumbo, please read this http://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/, think about it and then give me your feedback again on the issue. I have absolute respect for the community and you're a great part of it. So if it not in the spirit of it and I'm in the wrong, I'll be the first to propose the deletion. Thks. – Frankie Nov 20 '12 at 19:12
  • possible duplicate of [Best way to store password in database](http://stackoverflow.com/questions/1054022/best-way-to-store-password-in-database) – Jon 'links in bio' Ericson Nov 20 '12 at 19:41
  • 1
    @Frankie I removed the part of your question about LinkedIn, as your question was getting some not-constructive close votes. Feel free to rollback this change if you don't agree with it :) – Rachel Nov 20 '12 at 19:47
  • 2
    @Rachel: The answer refers to the LinkedIn question. – Robert Harvey Nov 20 '12 at 19:48
  • 5
    Ask the experts! [How to securely hash passwords?](http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords) Also related: [How to store salt?](http://security.stackexchange.com/questions/17421/how-to-store-salt), and going into more detail [Do any security experts recommend bcrypt for password storage?](http://security.stackexchange.com/questions/4781/do-any-security-experts-recommend-bcrypt-for-password-storage), as well as many other [passwords](http://security.stackexchange.com/questions/tagged/passwords) questions. – Gilles 'SO- stop being evil' Nov 20 '12 at 19:56
  • And regarding the LeakedIn leak: [Why would salt not have prevented LinkedIn passwords from getting cracked?](http://security.stackexchange.com/questions/15910/why-would-salt-not-have-prevented-linkedin-passwords-from-getting-cracked) – Gilles 'SO- stop being evil' Nov 20 '12 at 19:59
  • @RobertHarvey I see what you mean. I've made another edit attempt to get this reopened, and have made sure to leave the LinkedIn example in the question :) – Rachel Nov 20 '12 at 19:59
  • 4
    Why is it "Not Constructive" in the first place? The answer demonstrates that it is definitively answerable. – Robert Harvey Nov 20 '12 at 20:02
  • 1
    @RobertHarvey You'd have to ask the close voters that, but I would guess it had something to do with having so many different questions in the question itself – Rachel Nov 20 '12 at 20:07
  • The first thing we need to do is stop people from calling hashed passwords "encrypted." For Cthulhu's sake, hashing and encryption are completely different concepts: hashing isn't reversible, encryption is. – NullUserException Nov 20 '12 at 20:30
  • 2
    @NullUserException: Which is the pivotal concept that posts like these elucidate. – Robert Harvey Nov 20 '12 at 20:32
  • @RobertHarvey it's "not constructive" because there are many different ways of addressing the problem, and people are often quite opinionated about what is the "correct" way to do it. It's not "not constructive" because it can't be answered, but because there are so many ways that it could be answered and the existing answer just chose one. Having said all that, I chose NARQ as a close reason; they both apply, but that is a better fit. – Servy Nov 20 '12 at 20:34
  • [related meta post](http://meta.stackexchange.com/questions/156271/should-my-self-answered-question-be-deleted) since it seems nobody has posted it before now. – Servy Nov 20 '12 at 20:37

2 Answers2

8

Disclaimer: I've originally posted this on Quora but felt that the answer was more suited to Stack Overflow.

The method used to store and check user passwords without actually keeping the passwords is to compare the user input to the stored hash.

What is hashing?

Hashing is the process of passing data of variable length (small passwords, big passwords, binary files, whatever) through an algorithm that returns it as a set of fixed length called a hash value. Hashes only work one way. An *.img file consisting of several Mb can be hashed exactly the same as a password. (actually it's a common practice to use hashes on large files to check for their integrity; say you download a file using bittorrent, when it's complete the software hashes it and compares the hash of what you have with the hash of what you where supposed to have, if they match the download is not corrupt).

How does auth with hashes work?

When the user registers he gives a password, say pass123 that is then hashed (by any of the available hashing algorithms: sha1, sha256, etc, on this case md5) to the value 32250170a0dca92d53ec9624f336ca24 and that value is stored on database. Every time you try to login the system will hash you password in real time and compare it to the stored hash, if it matches, you're good to go. You can try an online md5 hasher here: http://md5-hash-online.waraxe.us/

What if two hashes are the same? Could a user login with a different pass?

He could! That is called a collision. Say that on a fictional hashing algorithm the value pass123 would produce the hash ec9624 and the value pass321 would produce the exact same hash, that hashing algorithm would be broken. Both common algorithms md5 and sha1 (the one LinkedIn used) are broken as collisions have been found. Being broken does not necessarily means it's unsafe.

How can you exploit collisions?

If you can generate a hash, that is the same as the hash generated by the user password you can identify to that site as the user.

Rainbow table-attacks.

Crackers quickly understood that once they had captured a table of hashed-passwords it would not be feasible to exploit passwords one by one so they devised a new attack vector. They would generate every single password in existence (aaa, aab, aac, aad, etc, etc) and store all the hashes in a database. Then they would only need to search for the stolen hash on the database with all the sequentially generated hashes (a sub-second query) and get the according password.

Salt to the rescue (and where LinkedIn failed big!)

Security is defined by the amount of time it will take for a cracker to break your password and the frequency by which you change it. With rainbow tables security drops really fast so the industry came up with salt. What if every password had a unique twist? That's salt! For every user that registers you generate a random string, say 3 characters (the industry recommends 16 chars - https://stackoverflow.com/a/18419...). Then you concatenate the user's password with your random string.

password - salt - sha1 hash  
qwerty   - 123  - 5cec175b165e3d5e62c9e13ce848ef6feac81bff  
qwerty   - 321  - b8b92ab870c50ce5fc59571dc0c77f9a4a90323c  
qazwsx   - abc  - c6aec64efe2a25c6bc35aeea2aafb2e86ac96a0c  
qazwsx   - cba  - 31e42c24f71dc5a453b2635e6ec57eadf03090fd  

As you can see the exact same passwords, given different values of salt, generate completely different hashes. That is the purpose of salt and why LinkedIn failed big. Notice that on table you will only store the hash and the salt! Never the password!

The first thing the guys that got their hand on the LinkedIn hashes did was to sort of the hashes and see if there were matches (there were because multiple users had the same password - shame on them!) those users were the first to drop. If the pass table was salted... none of that would have happened and they would need an excruciating amount of time (and computer resources) to crack every single password. That would have given LinkedIn plenty of time to enforce a new password policy.

Hope the technical side of the answer gave insight as to how authentication works (or should work).

Community
  • 1
  • 1
Frankie
  • 24,627
  • 10
  • 79
  • 121
  • 4
    The fact that a hashing algorithm has collisions doesn't make it "broken". If you don't restrict the size of your password to a low enough limit then there will be more possible passwords than hashes (which is almost always the case with any hashing algorithm). Good hashing algorithms simply minimize collisions as much as possible, and make an attempt for "similar" input values to not result in the same hash. Additionally, the size of the hash is inversely proportional to the probability of collision (with a non-terrible algorithm). The more possible hash values, the better. – Servy Nov 20 '12 at 19:44
  • 4
    -1 This answer contains several outdated concepts and should not be used as a reference for password security. Possibly part of this is because it tries to answer a question that's too broad and would require a much, much longer answer to be complete. – NullUserException Nov 20 '12 at 20:34
  • 3
    Badly written, doesn't mention essential concepts like strengthening, and the part about collisions is highly misleading. A collision would allow you to create a user for which you know two different valid passwords. Not a very useful attack. Finding a password that matches another user's hash, is called a pre-image, and that's still extremely expensive, even with MD5. – CodesInChaos Nov 20 '12 at 20:41
  • 3
    What @CodesInChaos means by "extremely expensive" is an attack in the order of 2^123.4, which would take more than the age of the universe to complete. Note that the currently feasible attacks on MD5 are chosen-prefix attacks, which are basically useless when it comes to attacking hashed passwords. That's not to say MD5 is a good algorithm to use for password protection (neither is any general purpose hashing function, like the SHA-* family). Failing to mention dedicated password hashing functions like bcrypt (and why they should be used) is the biggest issue I have with this answer. – NullUserException Nov 20 '12 at 20:52
  • 1
    @NullUserException, I believe you, but at least after reading this question I've learnt something about salt. Maybe there should be a 'further reading' footer on here? – Benjol Nov 21 '12 at 08:57
1

Really like it, when somebody askes this question, because this somebody want's to do it better. With knowing only a few important points, even well-known sites could have avoided lots of troubles.

Recently i wrote a tutorial about Hashing passwords, it uses a hopefully easy and understandable language. It allows to play around with SQL-injection, explains usage of salt and pepper and points out the need for slow key-derivation functions.

martinstoeckli
  • 23,430
  • 6
  • 56
  • 87