10

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0. However, I'm pretty sure this wouldn't work so well with Unicode. I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property, but I can't think of a specific optimization that would do this off the top of my head.

Thanks.

UPDATE: Sorry, I don't believe I was clear. I'm not looking for O(1), I'm looking for something that won't leak timing information. This would be used to compare hashed password values, and if the time it took to compare was different based on where the first mismatch occurred, that would be leaking information to an attacker.

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
  • 7
    The time it takes to compare 2 strings would always be a function of the length of the string. You have to do some sort of comparison one way or another - and the more characters you have, the longer this comparison would take. – Aleks G Aug 25 '11 at 13:25
  • How would the complexity of XORing each character not depend on the number of characters? – Kerrek SB Aug 25 '11 at 13:26
  • 1
    Am I correct in assuming you're asking for a String comparison function (returning `true` of `false` for equal or non-equal) that is entirely independent from String length? I don't actually think that's theoretically possible unless you have a number of processors/cores equal to the maximum String length. The best I can think of is optimizing strongly through comparing hash codes and length first, but even the hash code calculation depends on String length. – G_H Aug 25 '11 at 13:27
  • 1
    you might be able to use the `hash` field, it might help to determine 2 strings are *not* equal, if the hash was already computed, but I think that's about it. – amit Aug 25 '11 at 13:28
  • Can you tell us why you want to do this? Given the answers telling you that this is either impossible or impractical, knowing what you are trying to achieve would help us give you a good answer. – DJClayworth Aug 25 '11 at 13:43
  • Sorry for the original wording, which didn't express my intent. I've attempted to clarify. – Hank Gay Aug 25 '11 at 14:10

6 Answers6

5

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character

Do you see the contradiction?

update:

To the updated, and therefore different question:

Can you gain information once, how much time is spent for comparing 2 Strings, in terms of constant amount and time, depending on the length of the two strings?

a + b*s(1).length + c*s(2).length + d*f(s(1), s(2))? 

Is there an upper bound of characters for String 1 and 2?

If the time is, depending on a factor for the machine, for example for the longest strings you expect 0.01ms. You measure the time to encode the string, and stay idle until you reach that time, maybe + a factor of rand(10%) of the time.

If the length of the input is not limited, you could calculate the timing in a way, that will fit for 99%, 99.9% or 99.99% of typical input, depending on your security needs, and the speed of the machine. If the program is interacting with the user, a delay up to 0.2s is normally experienced as instant reaction, so it wouldn't annoy the user, if your code sleeps for 0.19994s, while doing real calculations for 0.00006s.

Community
  • 1
  • 1
user unknown
  • 35,537
  • 11
  • 75
  • 121
  • 3
    The randomness doesn't necessarily help since the attacker can just take more samples to exclude it completely. – Michael Mior Sep 12 '12 at 18:23
  • @MichaelMior That's surely true, but the comparison takes some nanoseconds only and adding a random milliseconds delay may make the number of needed samples damn high. – maaartinus Apr 13 '15 at 12:34
  • 1
    The question asked about not leaking info about "the number of characters that match". By excluding "that match" from your bold text, you have completely changed his meaning. This answer is wrong. – Rich Jun 22 '15 at 13:31
3

The following code is one common way to do a constant-time byte[] comparison in Java and avoid leaking password info via the time taken:

public static boolean isEqual(byte[] a, byte[] b) {
    if (a.length != b.length) {
        return false;
    }

    int result = 0;
    for (int i = 0; i < a.length; i++) {
      result |= a[i] ^ b[i];
    }
    return result == 0;
}

See http://codahale.com/a-lesson-in-timing-attacks/ for more discussion of this issue.

Current implementation in openjdk as an inspiration is available in isEqual method.

(This assumes that the length of the secret is not sensitive, for example if it is a hash. You should pad both sides to the same length if that is not true.)

This is essentially the same as your first suggestion:

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0.

You asked:

However, I'm pretty sure this wouldn't work so well with Unicode.

This is a valid concern, but you need to clarify what you will accept as "equal" for a solution to be proposed. Luckily, you also say "This would be used to compare hashed password values", so I don't think that any of the unicode concerns will be in play.

I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property,

Hopefully that's not true. I expect that the literature on how to avoid timing attacks in Java would address this if it were true, but I can't offer you any citations to back this up :-)

Lubo
  • 1,621
  • 14
  • 33
Rich
  • 15,048
  • 2
  • 66
  • 119
  • Rich, you probably need to look at the early exit on different password lengths. Even the _length_ of the password is information that can be leaked. – paxdiablo Jun 23 '15 at 07:36
  • 1
    Thanks, I've updated my answer to note this. I think in most cases the secret will be a hash, so the length is not sensitive. – Rich Jun 23 '15 at 09:24
3

I see two immediate possibilities for not leaking password-related information in timing:

1/ Pad both the password string and candidate string out to 1K, with a known, fixed character (like A). Then run the following (pseudo-code):

match = true
for i = 0 to 1023:
    if password[i] != candidate[i]:
        match = false

That way, you're always taking the same amount of loops to do the comparison regardless of where it matches.

There's no need to muck about with xor since you can still do a simple comparison, but without exiting the loop early.

Just set the match flag to false if a mismatch is found and keep going. Once the loop exits (taking the same time regardless of size or content of password and candidate), then check whether it matched.

2/ Just add a large (relative to the normal comparison time) but slightly random delay at the end of the comparison. For example, a random value between 0.9 and 1.1 seconds. The time taken for the comparison should be swamped by the delay and the randomness should fully mask any information leakage (unless your randomness algorithm leaks information, of course).

That also has the added advantage of preventing brute force attacks since a password check takes at least about a second.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • My concern is that the strings might not be normalized, so the function might return `false` even if both values are `español` (to use a completely made up and perhaps not even valid example). Also, I'm aware that Unicode is not an encoding, but as I understand it, normalization is a problem no matter what encoding is in use. – Hank Gay Aug 25 '11 at 14:04
  • @Hank, now your requirement makes a little more sense - I've actually had to do this sort of stuff before, hence my wonder answer at http://stackoverflow.com/questions/712339/why-should-checking-a-wrong-password-take-longer-than-checking-the-right-one/712363#712363 :-) I've updated this answer with a couple of possibilities for you. – paxdiablo Aug 25 '11 at 14:15
  • The padding will have the nice benefit of not even revealing the length of the password, thanks. Do you have any thoughts on the normalization issue? – Hank Gay Aug 25 '11 at 14:41
  • @Hank: yes, normalise them before comparison. All your _stored_ passwords should be normalised in whatever DB/file you store them in (this should be done whenever they're created or updated). Normalise the user input before comparison. This will take a variable length of time based on the size of the user input but the attacker _already_ knows that, so you're revealing nothing. – paxdiablo Aug 25 '11 at 14:43
  • 3
    You may want to change the comparison in 1/ to `match |= password[i] ^ candidate[i]` (Note that this also inverts the value of `match`, so it really should be `no_match` and initialized to `0`). The different branches of the `if` statement can stil leak timing information. (There's an extra jump if the characters match.) – Michael Mior Sep 12 '12 at 18:21
  • @Rich, good point, this answer was from a time when I didn't like removing info but it makes more sense to remove it so I have. As to the delay, I've also modded that a bit to introduce randomness rather than a fixed delay. – paxdiablo Jun 23 '15 at 07:08
  • 2
    I think your point (2) is still bad advice. Adding a random delay will not defeat a timing attack. An external attacker can still estimate X from (X + random(0.9,1.1)) by statistical methods. They'll be averaging out random delays already to take account of the network delays, so this doesn't even make it harder. See http://codahale.com/a-lesson-in-timing-attacks/ for some commentary. (I've removed my previous comment, as you've obsoleted it, thanks.) – Rich Jun 23 '15 at 09:21
  • Rich, the statistical methods will require a rather large number of samples to be taken, something made much harder with the fact each sample now takes a full second. In any case, there are now two independent variables that need to estimated (network delay and arbitrary random delay) which makes the required sample space MUCH larger. – paxdiablo Jun 23 '15 at 14:28
3

This should take approximately the same time for any matching length Strings. It's constant-time with a big constant.

public static boolean areEqualConstantTime(String a, String b) {
    if ( a.length != b.length ) {
        return false;
    }

    boolean equal = true;
    for ( long i = 0; i < (Long)Integer.MAX_INT; i++ ) {
        if ( a.charAt((int)(i % aChars.length)) != b.charAt((int)(i % bChars.length))) {
            equal = false;
        }
    }
    return equal;
}

Edit

Wow, if you're just trying to avoid leaking timing information this facetious answer got pretty close to the mark! We can start with a naive approach like this:

public static boolean arePasswordsEqual(String a, String b) {
    boolean equal = true;
    if ( a.length != b.length ) {
       equal = false;
    }

    for ( int i = 0; i < MAX_PASSWORD_LENGTH; i++ ) {
        if ( a.charAt(i%a.length()) != b.charAt(i%b.length()) ) {
            equal = false;
        }
    }
    return equal;
 }

We need the MAX_PASSWORD_LENGTH constant because we can't simply use either the max or the min of the two input lengths as that would also leak timing information. An attacker could start with a very small guess and see how long the function takes. When the function time plateaus, he would know his password has the right length which eliminates much of the range of values he needs to try.

Mark Peters
  • 80,126
  • 17
  • 159
  • 190
  • @Hank: No problem. Just as an aside, the popular cryptographically secure hashes have constant length outputs and due to the snowball principle leaking timing information probably wouldn't be as big of a deal. Are you comparing passwords in clear text? – Mark Peters Aug 25 '11 at 15:22
  • No, I'm looking at the source to a Java impl of bcrypt. It's certainly not *that* big of a deal, but w/ security I always opt for safer than I think is remotely necessary. I'm thinking about patching it to use this comparison instead of the built-in Java comparison. – Hank Gay Aug 25 '11 at 16:47
  • @Hank: Cool. And by "snowball principle" I meant "avalanche property" – Mark Peters Aug 25 '11 at 16:59
  • 1
    Wouldn't it be simpler to use the length of the user-controlled string instead of MAX_PASSWORD_LENGTH? Since this length is already known, nothing would be leaked. – qerub Nov 03 '12 at 18:55
  • 1
    @Qerub: Yep, you could do that. As I said earlier, this is pretty irrelevant if you're hashing passwords to a fixed-length hash. – Mark Peters Nov 04 '12 at 03:52
  • Note that this answer definitely still leaks timing information: you can repeatedly call `arePasswordsEqual` with different lengths of "test password" and it will return right away if the length of the test and the length of the real password are not the same length. So you just keep trying with various lengths until you get a call that takes longer than the others. Now you know the actual length. – Christopher Schultz Dec 03 '21 at 18:06
  • I know you specifically said that most hash algorithms have equal-length outputs, but that's an implementation detail that was out-of-scope of the original question. – Christopher Schultz Dec 03 '21 at 18:18
  • @ChristopherSchultz: Look again. I don't return right away if the lengths don't match. The result will just never be true. That said, I'm sure there would still be ways to glean information from branch prediction etc, so I'll give the standard advice here: never roll your own crypto. – Mark Peters Dec 05 '21 at 16:44
  • @MarkPeters I must have been having a bad day. You are absolutely right. Apologies for the noise. I'm happy to delete the comments to clean things up. – Christopher Schultz Dec 06 '21 at 12:59
-1

You can do constant time string comparisons if you intern your strings. Then you can compare them with == operator resulting in a constant time equality check.

Read this for further detials

numan salati
  • 19,394
  • 9
  • 63
  • 66
  • 1
    Interning is not an `O(1)` operation though! – Mark Peters Aug 25 '11 at 13:49
  • Yes thats the tradeoff. It depends on whether you have more unique strings or duplicates in your data set. – numan salati Aug 25 '11 at 14:08
  • 1
    The problem is making the comparison of a user-supplied string and a secret to a constant time operation. The secret can be interned in advance, but the time for the interning of the user-supplied string may be observed. It's unclear whether and how it depends on the secret string, but I'd call it risky. – maaartinus Apr 13 '15 at 13:08
  • operator `==` does answer question, if given strings are same instances... Intern could make deduplication and make `==` same result as `.equals`, but timing can leak from calling intern method... – Lubo Sep 06 '22 at 15:09
-1

The usual solution is to set a timer, and not return the result until the timer has expired.

The time taken to compare strings or hashes is not important, just set the time-out value sufficiently large.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
  • 1
    I don't think that's the "usual" solution, as it's very slow. Most languages and frameworks do something like "OR"ing every byte. See e.g. http://codahale.com/a-lesson-in-timing-attacks/ or http://php.net/manual/en/function.hash-equals.php – Rich Jun 22 '15 at 13:40