72

I learned about using char[] to store passwords back in the Usenet days in comp.lang.java.*.

Searching Stack Overflow, you can also easily find highly upvoted questions like this: Why is char[] preferred over String for passwords? which agrees with what I learned a long, long time ago.

I still write my APIs to use char[] for password. But are that just hollow ideals now?

For example, look at Atlassian Jira's Java API: LoginManager.authenticate which takes your password as a String.

Or Thales' Luna Java API: login() method in LunaSlotManager. Of all people, an HSM vendor using String for the HSM slot password.

I think I've also read somewhere that the internals of URLConnection (and many other classes) uses String internally to handle the data. So if you ever send a password (although the password is encrypted by TLS over the wire), it will be in a String in your server's memory.

Is accessing server memory an attack factor so difficult to achieve that it is okay to store passwords as String now? Or is Thales' doing it because your password will end up in a String anyway due to classes written by others?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
KC Wong
  • 2,410
  • 1
  • 18
  • 26
  • 13
    JDBC also uses strings for passwords and it’s as old as Java 1.1. In the end, nothing has changed. Using `char[]` instead of `String` is just a best-effort with no guarantee to work better than a `String`. The question is not, how difficult accessing server memory is. The question is, how difficult is it for an attacker who already has access to your memory, to get the password anyway, maybe even before your application gets it. – Holger Oct 06 '22 at 06:42
  • 1
    @Holger: Did this come with instructions to erase the array when done with it? – Joshua Oct 07 '22 at 20:04
  • 2
    @Joshua what do you mean with “this”? JDBC? Of course, it doesn’t have such instructions, as there is no array to erase. – Holger Oct 10 '22 at 07:27
  • @Holger: This = "I learned about using char[] to store passwords " – Joshua Oct 10 '22 at 13:43
  • 1
    @Joshua why do you ask me? I’m not the one who said that. – Holger Oct 10 '22 at 14:13
  • @Holger: I may have made a mistake. I can't figure out a why. I have a real hypothesis that depends on `char[]` and clearing the memory; and it also has good explaining power why JDBC didn't care. – Joshua Oct 10 '22 at 14:32

5 Answers5

49

First, let’s recall the reason for the recommendation to use char[] instead of String: Strings are immutable, so once the string is created, there is limited control over the contents of the string until (potentially well after) the memory is garbage collected. An attacker that can dump the process memory can thus potentially read the password data. Meanwhile the contents of the char[] object can be overwritten after it has been created. Assuming that this is done, and that the GC hasn’t moved the object to another physical memory location in the interim, this means that the password contents can be destroyed (somewhat) deterministically after it has been used. An attacker reading the process memory after that point won’t be able to get the password.

So using char[] instead of String lets you prevent a very specific attack scenario, where the attacker has full access to the process memory,1 but only at specific points in time rather than continuously. And even under this scenario, using char[] and overwriting its contents does not prevent the attack, it just reduces its chance of success (if the attacker happens to read the process memory between the creation and the erasure of the password, they can read it).

I am not aware of any evidence that shows (a) how frequent this scenario is, nor (b) how much this mitigation reduces the probability of success under that scenario. As far as I know, this is pure guesswork.

In fact, on most systems, this scenario likely does not exist at all: an attacker who can get access to another process’ memory can also gain full tracing access. For instance, on both Linux and Windows any process that can read another process’ memory can also inject arbitrary logic into that process (e.g. via LD_PRELOAD and similar mechanisms2). So I would say that this mitigation at best has a limited benefit, and potentially none at all.

… Actually I can think of one specific counter-example: an application that loads an untrusted plugin library. As soon as that library is loaded via conventional means (i.e. in the same memory space), it has access to the parent application. In this scenario, it might make sense to use char[] instead of String, and overwrite its contents when done with it, if the password is handled before the plugin is loaded. But a better solution would be not to load untrusted plugins into the same memory space. A common alternative is to launch it in a separate process and communicate via IPC.

(See the answer by Gilles for more vulnerable scenarios. I still think that the benefit is relatively limited, but it’s clearly not nil.)


1 As shown in Gilles’ answer, this is not correct: no full memory access is required to mount a successful attack.

2 Although LD_PRELOAD specifically requires the attacker to not only have access to another process but either to launch that process, or to have access to its parent process.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 10
    To be noted: using `char[]` alone does not give any benefit. the benefits are provided by overwriting the value. So changing `String password` to `char[] password` provides no benefits at all. It's the overwriting of `password` after use that would provide any benefits. Also, I'm not sure that Java garbage collector will zero-out/overwrite memory while moving objects around, so it's likely that even a `char[]` can leave copies of its contents around in unused RAM. So to maximize benefits you'd also want to disable GC collections between loading the password in RAM and overwriting it. – GACy20 Oct 06 '22 at 14:43
  • 3
    @GACy20 Didn’t my answer already mention this? Your first point is made fairly explicitly in my answer. I addressed the second point in a mere half-sentence to avoid bloating the already-long answer even more (and it’s made more explicit in the discussion that’s linked in the question). And no, (AFAIK) the GC does *not* zero out the memory it moves. – Konrad Rudolph Oct 06 '22 at 14:54
  • @KonradRudolph: Although it would be awkward, one could design an "encrypted password" class which behaved as a "collection of bytes" class, but whose accessors all accepted a pair of `long` values as arguments that would be interpreted as keys for AES or some similar scheme. If the keys were kept on the call stack, one could ensure that the heap would never contain all of the information necessary to extract the password. – supercat Oct 06 '22 at 18:01
  • @supercat Unfortunately the call stack is in principle *also* inspectable to outside processes. Functional interposition can intercept calls to any libraries, and with a bit more work it could probably also view the rest of the stack. That said, as an additional layer this might make sense. The question is whether it’s worth the effort. – Konrad Rudolph Oct 07 '22 at 06:53
  • 2
    It's not just about partial vs full memory access, but also what gets compromised. If the attacker gets memory read access but not remote execution, this compromises secrets that are currently present in the memory. Zeroizing secret data as soon as it isn't needed anymore helps in this scenario because only _current_ secrets are then compromised and not also _past_ secrets. – Gilles 'SO- stop being evil' Oct 07 '22 at 10:16
  • "… Actually I can think of one specific counter-example: an application that loads an untrusted plugin library." - Possibly also RCE in C code. – Maciej Piechotka Oct 07 '22 at 15:58
37

(Note: I am a security expert but not a Java expert.)

Yes, there is a significant security advantage in using char[] rather than strings for passwords. This also applies to some extent to other highly confidential data, although most highly confidential data (e.g. cryptographic keys) tends to be bytes and not characters.

The old, and still valid, reason to use char[] is to clean up memory as soon as it is used, which is not possible with String. This is a very firmly established security practice. For example, in the (in)famous FIPS 140 requirements for cryptographic processing, which are generally considered to be security requirements, there are in fact extremely few security requirements at level 1 (the easiest level). Just two, in fact: one is that you may only used approved cryptographic algorithms, and the other one is that keys, passwords and other sensitive data must be wiped after use.

This practice is one of the reason why production implementations of cryptographic primitives are usually implemented in languages with manual memory management such as C, C++ or Rust: cryptography implementers want to retain control of where sensitive data goes, and to be sure to wipe all copies of sensitive material.

As an example of what can go wrong, consider the (in)famous Heartbleed bug. It allowed anyone on the Internet connecting to a vulnerable server to dump some of the memory of the server, without being detected. The attacker didn't get much control over which part of the memory, but could try again and again. An attacker could make requests that would cause the dumpable part to move around the heap, and thus could potentially dump the whole memory.

Are such bug common? No. This one got a lot of buzz because it was in a very popular software and the consequences were bad. But such bugs do exist and it's good to protect against them.

In addition, since Java 8, there is another reason, which is to avoid string deduplication. String deduplication means that if two String objects have the same content, they may be merged. String deduplication is problematic if an attacker can mount a side channel attack when the deduplication is attempted. The attack does not require the password to be deduplicated (although it is easier in this case): there's a problem as soon as some code compares the password against another string.

The usual way to compare strings for equality is:

  • If the lengths are different, return false.
  • Otherwise compare the characters one by one. As soon as there are different characters at one position, return false.
  • If the end of the strings is reached without encountering a difference, return true.

This has a timing side channel: the time of the middle step depends on the number of identical characters at the beginning of the string. Suppose that an attacker can measure this time, and can upload some strings for comparison (e.g. by making legitimate requests to a server). The attacker notices that comparing with sssssssss takes slightly longer than comparing with aaaaaaaaa, so the password must begin with s. Then the attacker tries to vary the second character, and finds that comparing with swwwwwwww takes again slightly longer. And thus, in relatively short time, the attacker can reconstruct the password character by character.

In the context of string deduplication, the attack is harder, because (as far as I know) the deduplication code first hashes the strings to compare. This may mean that the attacker has to first guess the hash value. But the total number of hash values in a given hash table (that's the number of hash buckets, not the full range of the hash method) is small enough that it's practical to enumerate.

This is not an easy attack, to be sure. But I would absolutely not rule it out, especially with a local attacker, but even with a remote attacker. Remote timing attacks are practical (still).

In conclusion, yes, you should not use String for passwords. Read them as char[], keep careful track of any copies, hash them as soon as possible if you're verifying them, and wipe all copies.

If you need to store a password for a third-party service, it's a good idea to store it in encrypted form even if there is no separate access control for the encryption key. Copies of an encrypted password are less prone to leaking through side channels than copies of the password itself, which is a printable string with low entropy.

I think I've also read somewhere that the internals of URLConnection (and many other classes) uses String internally to handle the data. So if you ever send a password (although the password is encrypted by TLS over the wire), it will be in a String in your server's memory.

I'm not a Java expert, but this doesn't sound right: the plaintext of a connection (TLS or otherwise) is a byte stream, not a character stream. It should be arrays of 8-bit bytes, not arrays of Unicode code points.

Or that your password will end up in a String anyway due to classes written by others, is that why Thales' doing it.

Possibly. Or possibly because they aren't Java experts, or because the people who write the high-level layers are often not the foremost security experts.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
  • 1
    as security expert, would you mind shedding some light on [very specific attack scenario: the attacker has full access to the process memory](https://stackoverflow.com/a/73970349/3426309) opinion? Are diagnostic dumps, backup operators, etc also part of security concerns? – Andrey B. Panfilov Oct 06 '22 at 22:11
  • 2
    @AndreyB.Panfilov Please note that I have amended this part of my answer, since it was clearly incorrect. The rest of the answer stands; in particular, once another application has full access to the memory of a process, it’s game over as far as mitigation via password zeroisation (= overwriting `char[]` buffer) is concerned. But I hadn’t considered vulnerabilities that provide *incomplete* memory access (or side channels), as described in this answer. That’s why you listen to security experts. :-) – Konrad Rudolph Oct 07 '22 at 09:54
  • 2
    +1, but the bit about Java's string deduplication feature is not quite right. That feature uses the hash not only to find the right hash bucket, but also to filter *within* that bucket, by explicitly comparing hashes before comparing lengths and contents. \[[source link](https://github.com/openjdk/jdk/blob/4594696f5462995ec58ca1d2c1bde7cc857c5caf/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp#L186)\] So it's not as simple to enumerate hashes as this answer suggests. – ruakh Oct 07 '22 at 22:17
  • 5
    This seems overly dramatic. Mounting a side channel attack on string deduplication requires so much determination that such an attacker could better use it to find, like, five 0-day on the JVM itself. Furthermore, this answer doesn't weight the security gain against the added complexity of using `char[]/byte[]` instead of `String` and doesn't take into consideration that with a very high probability the password will be nonetheless stored in a string at some point before/after has been handled to you. Nor it deals with the fact that most apps need to keep an equally secret auth state. – Margaret Bloom Oct 09 '22 at 12:50
  • 3
    String deduplication is only applied to strings which survived multiple garbage collection phases. So, not only would the attacker have to somehow bring the strings into the server but also somehow make the server keep strong references to those strings (and the password string itself) and then somehow get hands on the garbage collection statistics. And, of course, the attacker would somehow be able to extract the fraction of their strings from the time needed for all other strings, not to mention, the costs of the actual garbage collection. For a *concurrent* collector, typically. – Holger Oct 10 '22 at 07:43
14

Lots of detail in the answers but here's the short of it: yes, in theory, putting the password in an array and wiping it provides security benefits. In practice, that only helps if you can avoid the password ever being stored in a String. That is, if you take a password stored in a String and put the contents of the String into a char[], it doesn't magically make the String disappear from the heap. The necessary requirement is that the password never is placed in a String at all. I'd be interested to see that successfully implemented in a real Java application.

JimmyJames
  • 1,356
  • 1
  • 12
  • 24
  • 1
    If the only Strings to contain the password are very short-lived and local, they might get GCed and hopefully reused sooner than your `char[]` object? Or even allocated on the stack instead of heap if it's just local to one function? Hmm, that's probably not very good, the whole idea is to get the memory overwritten *before* letting the GC put it on a free list. So yeah, I think this is pretty much accurate. I was thinking you might have an object that couldn't even get GCed, but probably you can overwrite it with an empty string, updating the reference to it. So that doesn't justify it. – Peter Cordes Oct 09 '22 at 01:26
  • @PeterCordes It is possible that you could have all copies of the string be overwritten quickly by a copy collector. I'm struggling to imagine that being isolated to the stack for escape analysis because the password value is going to come from some sort of I/O. But again, putting the string value into a char[] doesn't change anything about what happens with the strings. – JimmyJames Oct 10 '22 at 14:42
  • My hypothetical justification for `char[]` maybe helping a small amount was that passing around more references to the String might be worse, and might result in it even getting copied more times. But probably not *much* if any worse, and String being immutable probably doesn't tend to get copied. Or if some API forces you to start with the password as a String, copying to a `char[]` ASAP while there's still some work left to do could increase the chance of the String getting GCed and maybe overwritten. (And you can zero out the char[] yourself at the end.) – Peter Cordes Oct 10 '22 at 14:54
  • Those are much less good than never having had the password exist as a String at all, of course. This answer makes a good point, I'm just trying to see if there's any way that it's not entirely useless to use a `char[]` at all, if you can't avoid `String` entirely. Maybe only mostly useless. – Peter Cordes Oct 10 '22 at 14:56
  • @PeterCordes It should be theoretically possible but (spitballing here) I think you'd need to design then entire stack around char[]. For example, consider basic auth with HTTP headers. We can't put any of the headers (or header section) into a string until the content has understood. The API would then need to be based around char[] or have special features for getting values that should not be put into strings. I could be wrong but I can't recall ever seeing an API that was like that. And with the number of layers in contemporary Java applications, it seems impractical. – JimmyJames Oct 10 '22 at 16:16
11

Almost everyone else's answer plus one additional point: Swap space on a storage media.

If the JVM heap is ever paged out to disk and the password is still in memory as a string (immutable and not GC'd), it will be written to the swap file. This swap file can then be scanned for password values, so, essentially another attack vector that's time based and still rather difficult to utilize but obviously not that difficult because we're here :D.

Wiping the mutable array at least reduces the time where the password is in memory.

The story I heard was that if an attacker can attack your process (like a DDOS) and trigger the process to swap out, it's somewhat easier to attack the swap space than the memory, AND swap space is preserved across boots/crashes/etc. This allows for yet another attack vector where the attacker pulls the swap drive out to scan the swap space.

millebi
  • 351
  • 4
  • 9
  • 3
    Yup, exactly. If you can get the JVM to crash on a system where the OS saves a core-dump of the process to a file, that works too. (Perhaps due to a bug in native code the Java program called via JNI? Or a bug in the JVM itself. A bug you can exploit to cause a crash (DOS), but haven't found a way to gain control of the process.) Or yes, a DOS attack that triggers high load on the server by having lots of processes active using RAM. Or perhaps the Java application in question is on a laptop that you will later steal, which doesn't use full-disk encryption or is left unlocked. – Peter Cordes Oct 09 '22 at 01:18
  • 3
    If somebody has such control (physical access and full root!) over the system running the target application, it probably can simply patch the application itself or intercept clean traffic. – Margaret Bloom Oct 09 '22 at 12:58
  • I forgot to mention that the heap paging attack was a lot easier when harddrive encryption was not reasonable or common (physical security seemed good enough back in those days), and there were more easily exploitable privilege escalations available. – millebi Oct 09 '22 at 16:14
5

It's not an idea of the moment of transfer over the network. There indeed you're indeed better off using a String as it's just more convient to use to send over the network, of course making sure it's properly encrypted.

For using passwords in applications it's different due to stack-dumps and reverse engineering, and the problem of the String being immutable: In case the password has been entered, even if the reference to the variable is changed to another string, there is no certainty about when the garbage collector will actually remove the String from the heap. So a hacker being able to see the dump will also be able to see the password. Using an array of char prevents this as you can change the data in the array directly without relying on the garbage collector.

Now you might say: well then when sending it over the network as a String it'll still be visible no? Well yes, but that's why encrypting it before sending it is important. Never send plain text passwords over the network when possible.

  • 1
    When sending the password over the network you are *also* using it in the sending and receiving applications. So I don’t think this distinction exists, or that it’s the reason why some APIs use `String` rather than `char[]` for passwords (if they did, the reasoning would still be flawed). – Konrad Rudolph Oct 06 '22 at 14:57
  • It's how it was explained to me, made sense at the time. – Jurgen Rutten Oct 06 '22 at 17:32
  • Also, on the network, there's no such thing as data types. It's all a stream of bytes, so it doesn't matter the underlying structure being used at that level. Network attacks are trivial if SSH/TLS isn't in use, and still trivial within the datacente depending on how the keys are managed between processes. – millebi Oct 09 '22 at 16:17