1

I'm using vaadin, spring and jpa in my project. I need to check and inform user how strong his password is and want to do it in pure java.

Can you recommend me the best way to do it? Is it better to use special library or just check by regular expressions if password has at least one number etc. What do you think?

If you have any link to to this and good library or tutorial please send me.

vishal_g
  • 3,871
  • 4
  • 21
  • 34
akuzma
  • 1,592
  • 6
  • 22
  • 49
  • http://stackoverflow.com/questions/1614811/how-do-i-measure-the-strength-of-a-password here you can find an idea – DoppiaD Jan 29 '13 at 20:36

2 Answers2

5

In general, the strength of someone's password depends on how guessable it is. A more complex password is less guessable. You might estimate complexity by relating it to how likely a dictionary attack or brute force algorithm would be to guess a password. Once a password is verified not to be a common one, you might estimate its complexity by how long it would take to use brute force to guess it. In this sense, you might measure how many binary bits of entropy a similar random password would have. This would be a rough estimate of its guessability, although as one commenter points out, measuring complexity is difficult. In one security class I took, the instructor suggested that one should aim for at least 40 bits of entropy, but 60 is considered secure (assuming the password can't be distinguished from a randomly chosen one). So, the password meter's job would be to guess at the entropy of an equivalent random password. Obviously, there are a lot of pitfalls here, and using an established library would save you a ton of time.

The entropy per character is log10(number_possible_characters)/log10(2), so you might calculate entropy as pwd.length() * Math.log10(numPossibleCharacters)/Math.log10(2) -- and with 26 characters that's about 4.7 bits of entropy per character.

Measuring entropy in a purely scientific way isn't always a valid estimation of guessability, since a password "abc123" might look as good (to a computer, not a human) as "j3s9fn", but humans can easily see that the first one is much more likely to be guessed than the second. Strictly speaking, entropy is a measure of true randomness; while we know humans aren't random, they may be close enough to random that we measure entropy as a substitute, and hope our algorithm measures the kind of randomness/creativity/unguessability that humans produce. Thus, this is both a technical and a human-behavior question.

If I recall correctly, I was once told that English prose (written words on a page) only has about 1.5 bits of entropy per character--in other words, it is far more predictable, and so you'd need about 3 times as long a password for it to be "strong" if you just type English words, than if you type random lowercase gibberish. Frankly, I think password systems which REQUIRE symbols/numbers/uppercase are dumb, for this reason: I should be able to just type more prose if I want. YMMV.

We should start measuring entropy by estimating how many characters a user could have chosen, or was likely to choose from, for each character they type.

The number of possible characters numPossibleCharacters is based either on what character sets the user appears to use, or what you allow them to use. For example, if a user typed abc then you would assume they're choosing only from 26 possible characters (4.7 per char). However, if they type aBc you'd assume they chose from 52 (lowercase and uppercase) (5.7 per char). If they use numbers, add another 10 possible characters (unless they seem to be choosing obvious numbers only).

Also, users who add numbers as an afterthought tend to put them at the start or end of a password. So the number of times they change character sets might also be a good measure of password strength. For example, "word908" is clearly less secure than "w39or7d". When you do this, you step outside simple estimates of complexity.

If you use this looser definition of complexity, it's much harder to measure how likely a password is to be guessed by an attack of any sort, although you might try and imagine a smart attack that chooses semi-random passwords starting with the most guessable and most likely patterns. You might say the added complexity (entropy?) of each character is dependent on whether it's in the same character set as the immediately previous character, whether it's sequential from the previous, or if it repeats some obvious patterns (sequences like "123" or "abc").

You might say that each switch from one set of characters to another (lowercase to upper, or digits to symbols) is itself a random incident. Let's say we define 5 character sets: lowercase, uppercase, digits, common_symbols, and uncommon_symbols. We detect how many of those sets are in use (e.g. if the user types '123abc' then charSetsUsed is 2). Then, we loop over the characters in the string one at a time. Each time the user changes character sets we say that entropy += log(charSetsUsed)/log(2). Then, for each character, we also add entropy += log(charsInThisCharSet)/log(2). [ edit: this isn't really entropy, so perhaps you should read it as estimated complexity ]

If you really want to get technical, you can measure the number of character set changes. Let's say a password is 10 characters long. It could have 1 to 9 set changes, which is 10 options. We then say they are distributed as a combination of slots. So we do this:

 log(numChanges)/log(2) + log(combination(totalSlots, usedSlots))/log(2).  

Let's say a user enters aoq35esm42. We see 3 places where they switched from one character set to another. It's a 10-character password, so there are 9 possible positions for a set change (where a position occurs between any of the two characters), and their order doesn't matter (hence the combination/binomial coefficient/n choose r):

 log(numChanges)/log(2) + log(combination(9, 3))/log(2).  
 log(3         )/log(2) + log(    84           )/log(2). 
 1.5849625              + 6.3
 7.97727992

So we see that we have almost 8 bits of entropy that basically says "out of all the places and times they could have chosen to change character sets, there's about 8 random bits or 2^8 possibilities of what they could have done here". If we then calculate the entropy for each character based on its subset, and the choice of subsets to use on each change, we then add to the entropy probably like this (correct me if I make a mistake on my math):

  • aoq35esm42 has 6 lowercase-alphabetic characters, for 6 * log(26)/log(2), = 28.2026383
  • aoq35esm42 has 4 numeric characters, for 4 * log(10)/log(2), = 13.2877124
  • aoq35esm42 has 3 charset changes and 2 charsets, for 3 * log(2)/log(2), = 3
  • earlier, we calculated the placement of charset changes at 7.97727992 bits of entropy
  • we foolishly assume then that the total score is about 78, so it's a good password.

Combinations are calculated using the apache commons MathUtils.binomialCoefficientDouble() function.

Also, passwords are considered weak if they contain dictionary words. So if you can scan against a dictionary, you might assume (guess) that those characters should be measured as english prose with 1.5 bits of entropy per character. However, you don't want to expose the password to a database query (where it might be logged) so your best bet is to just guess whether there are English words (good luck), like assuming (rather badly) that any group of 3-5 single-case alphabetical characters, containing a vowel, is an English word. Or, you could construct a dictionary of the most common English syllables, which might be much easier to store in memory. Or, you could just give up and assume your users will game the system anyway to make their passwords more memorable.

Still, if you're determined to be accurate, you could store an in-memory database of english words in secure memory.

All of this adds up to a rather complex algorithm that is still probably incomplete. You should probably go with a relatively easy calculation, or use a library that someone else has already written.

''Secure storage''

The way to compute password security doesn't really account for an entirely different problem: keeping the password safe as it goes across the wire, or in-memory on the client. I'm not an expert here, but I've read enough to know to be wary and do my homework. At a minimum, I would use https and strongly consider using the browser's native password input field. This is another reason to avoid rolling your own technology, and instead use a library (as long as you can tell they've done more homework than you can manage; I don't know that I'd trust a technology unless I'd looked into it). Additionally, you should never store passwords in clear text, but use one-way encrypting/hashing. Salt your passwords, and hash them using a secure hashing algorithm (not MD5; prefer SHA2 perhaps?), possibly rehashing and re-salting multiple times (to increase the cost of a dictionary attack). There are typically libraries that take care of this sort of thing for you, even if they don't measure password strength; I know C# or .NET has libraries purposefully aimed at salting, hashing, and iterations; etc. There are probably other issues I haven't mentioned here--just remember password strength is only one link in your security chain.

Kimball Robinson
  • 3,287
  • 9
  • 47
  • 59
  • 2
    Your answer has one problem only: entropy is a measure applied to distributions, not individual values. Kolmogorov complexity would be more suited, but is practically incomputable. I don't want to sound harsh, but it's better to stick with libraries specifically designed for the purpose of password validation than coming up with eloquent but ultimately inapicable theoretical constructs. – blubb Feb 11 '14 at 22:36
4

In pure java, you can use VT Password to check password strength. Available from Maven Central:

<dependencies>
  <dependency>
      <groupId>edu.vt.middleware</groupId>
      <artifactId>vt-password</artifactId>
      <version>3.1.1</version>
  </dependency>
<dependencies>
Benoit Wickramarachi
  • 6,096
  • 5
  • 36
  • 46