(Note: I am a security expert but not a Java expert.)
Yes, there is a significant security advantage in using char[]
rather than strings for passwords. This also applies to some extent to other highly confidential data, although most highly confidential data (e.g. cryptographic keys) tends to be bytes and not characters.
The old, and still valid, reason to use char[]
is to clean up memory as soon as it is used, which is not possible with String
. This is a very firmly established security practice. For example, in the (in)famous FIPS 140 requirements for cryptographic processing, which are generally considered to be security requirements, there are in fact extremely few security requirements at level 1 (the easiest level). Just two, in fact: one is that you may only used approved cryptographic algorithms, and the other one is that keys, passwords and other sensitive data must be wiped after use.
This practice is one of the reason why production implementations of cryptographic primitives are usually implemented in languages with manual memory management such as C, C++ or Rust: cryptography implementers want to retain control of where sensitive data goes, and to be sure to wipe all copies of sensitive material.
As an example of what can go wrong, consider the (in)famous Heartbleed bug. It allowed anyone on the Internet connecting to a vulnerable server to dump some of the memory of the server, without being detected. The attacker didn't get much control over which part of the memory, but could try again and again. An attacker could make requests that would cause the dumpable part to move around the heap, and thus could potentially dump the whole memory.
Are such bug common? No. This one got a lot of buzz because it was in a very popular software and the consequences were bad. But such bugs do exist and it's good to protect against them.
In addition, since Java 8, there is another reason, which is to avoid string deduplication. String deduplication means that if two String
objects have the same content, they may be merged. String deduplication is problematic if an attacker can mount a side channel attack when the deduplication is attempted. The attack does not require the password to be deduplicated (although it is easier in this case): there's a problem as soon as some code compares the password against another string.
The usual way to compare strings for equality is:
- If the lengths are different, return false.
- Otherwise compare the characters one by one. As soon as there are different characters at one position, return false.
- If the end of the strings is reached without encountering a difference, return true.
This has a timing side channel: the time of the middle step depends on the number of identical characters at the beginning of the string. Suppose that an attacker can measure this time, and can upload some strings for comparison (e.g. by making legitimate requests to a server). The attacker notices that comparing with sssssssss
takes slightly longer than comparing with aaaaaaaaa
, so the password must begin with s
. Then the attacker tries to vary the second character, and finds that comparing with swwwwwwww
takes again slightly longer. And thus, in relatively short time, the attacker can reconstruct the password character by character.
In the context of string deduplication, the attack is harder, because (as far as I know) the deduplication code first hashes the strings to compare. This may mean that the attacker has to first guess the hash value. But the total number of hash values in a given hash table (that's the number of hash buckets, not the full range of the hash
method) is small enough that it's practical to enumerate.
This is not an easy attack, to be sure. But I would absolutely not rule it out, especially with a local attacker, but even with a remote attacker. Remote timing attacks are practical (still).
In conclusion, yes, you should not use String
for passwords. Read them as char[]
, keep careful track of any copies, hash them as soon as possible if you're verifying them, and wipe all copies.
If you need to store a password for a third-party service, it's a good idea to store it in encrypted form even if there is no separate access control for the encryption key. Copies of an encrypted password are less prone to leaking through side channels than copies of the password itself, which is a printable string with low entropy.
I think I've also read somewhere that the internals of URLConnection (and many other classes) uses String internally to handle the data. So if you ever send a password (although the password is encrypted by TLS over the wire), it will be in a String in your server's memory.
I'm not a Java expert, but this doesn't sound right: the plaintext of a connection (TLS or otherwise) is a byte stream, not a character stream. It should be arrays of 8-bit bytes, not arrays of Unicode code points.
Or that your password will end up in a String anyway due to classes written by others, is that why Thales' doing it.
Possibly. Or possibly because they aren't Java experts, or because the people who write the high-level layers are often not the foremost security experts.