7

I want to port a crypto function from C to Java. The function has to run in constant time, so no conditional branchings (and no table lookups based on x) are allowed.

The original C code is:

int x,result;
...
result = (x==7);
...

So that 'result' is set to 1 if 'x==7' and to 0 otherwise. The 'result' variable is then used in further computations.

I am now looking for the best way to transpose this to Java. As in Java expressions evaluate to booleans and not to integers, one has to simulate the above using operators.

I currently use

int x,result;
...
result = (1<<(x-7))&1;
...

which works fine for me, as my x is in the range {0,...,15}. (Note that the shift function uses only the lower 5 bits, so that you will get false positives when x is too large.)

The expression will be evaluated millions of times, so if it there is for instance a clever solution that uses only 2 operators instead of 3, this would make the overall computation faster.

Chris
  • 187
  • 7
  • 2
    Maybe `result = !(x^7)`. I did not time that but it might be faster. –  Sep 04 '15 at 23:23
  • @F3ras: The result must be either 0 or 1 for any value of x, so !(x^7) is not sufficient. – Chris Sep 04 '15 at 23:30
  • 1
    your code accepts any number of the form 32n + 7 for all n that are integers, basically if you put in 39, or 71, or -57, it gives a false positive – spyr03 Sep 05 '15 at 00:15
  • @Chris Why doesn't his method work? Are you using decimals as well? That is the only reason I can think of why this isn't working for you – smac89 Sep 05 '15 at 00:44
  • @spyr03: Yes, thank you, that's a good point! Indeed for the step of the shift only the 5 lower bits are considered. In my case that should be sufficient though. – Chris Sep 05 '15 at 00:56
  • @Smac89: If you have for instance x=3, then !(x^7)=!4 which would be different from 0 and 1, if ! worked on ints. But it even seems that ! only works on booleans. – Chris Sep 05 '15 at 01:05
  • 1
    @Chris What exactly are you trying to do with this check? More code would mean more help – spyr03 Sep 05 '15 at 01:25
  • @spyr03: It is a part of longer cryptographic computation. In order to protect against side channel attacks (like timing attacks), one typically converts if/the/else constructions to computations with logical operators and masks... I was just asking because I thought that somebody could know a better solution, but maybe there is no better solution. – Chris Sep 05 '15 at 01:55
  • Considering the brevity of the code excerpt (just one statement out of an algorithm) and the fact that the presented code obviously fails at its intended purpose, this is now firmly a "How do I accomplish this specific goal?" question rather than a "How could I improve my code?" question. I am therefore migrating it to Stack Overflow. – 200_success Sep 05 '15 at 03:53
  • What is the range of possible values for x? – Richard Schwartz Sep 05 '15 at 04:54
  • OP, I think you should clarify what you mean by "constant time." Most people (me included) will probably read that and think you mean O(1), which a non-looping branch will still give you. You mean really _constant_, so that an attacker can't gain insight into the system by sending in different values and timing how long each one takes (ie, the solution can't be affected by branch prediction). A number of the answers seem to miss that point. – yshavit Sep 05 '15 at 07:54
  • 2
    See [How can I make branchless code?](http://stackoverflow.com/questions/32107088/how-can-i-make-branchless-code/32107468) – apangin Sep 05 '15 at 11:53

5 Answers5

7

The best option as noted by @Hosch250 is ternary operator. Let's take a look at the assembler generated by JIT compiler for this method:

public static int ternary(int x) {
    return x == 7 ? 1 : 0;
}

It actually depends on branch profiling. When your x has value 7 quite often, it's compiled like this:

xor %r11d,%r11d
mov $0x1,%eax
cmp $0x7,%edx
cmovne %r11d,%eax  ;*ireturn
                   ; - Test::ternary@11 (line 12)

See that ternary was replaced with cmovne which is not the branch instruction.

On the other hand if you pass 7 in very rare cases (e.g. once in 5000 calls), then branch is here:

cmp $0x7,%edx
je <slowpath>  ;*if_icmpne
                       ; - Test::ternary@3 (line 12)
xor %eax,%eax

Now branch is almost never taken, so the faster is to keep the condition as CPU branch predictor will be almost always correct. Note that <slowpath> is not just return 1;, it also updates the branch profile, so if it happens that the pattern changed during the program execution (7 become to appear more often), then the method will be recompiled to the first version.

In general, don't try to be smarter than JIT-compiler in such simple cases.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • 1
    This is a branch, which is completely inappropriate for crypto functions. – user3427419 Sep 05 '15 at 07:50
  • 1
    @user3427419, let's randomly check some good crypto code, for example, [sha512](https://github.com/openssl/openssl/blob/fbfcb2243941bc84b7585711feb906610f9111c4/crypto/sha/sha512.c#L194) implementation from OpenSSL. Absolutely no branches there? – Tagir Valeev Sep 05 '15 at 08:05
  • 1
    You are showing that a branching instruction is sometimes compiled as non-branching code. Thanks for this interesting insight into the compiled code! I admit that my approach is only a 'best effort' constant time as one does never know what the compiler will do in the end. However, I still have the feeling (I may be wrong) that the compiler is less likely to produce a branching from a purely mathematical expression than from a ternary operator. – Chris Sep 05 '15 at 08:11
  • Timing is not really a problem for hash functions like SHA-512, because there is no secret key. A better example would be for instance at the ref10 implementation of Ed25519. – Chris Sep 05 '15 at 08:16
  • 1
    @Chris, actually branching is not the only possible problem which makes the code non-constant. Seemingly innocent load instruction (e.g. reading the array element) may (or may not) trigger the cache miss which may delay your code by many (up to 100) CPU cycles depending on where it should be loaded from (L1, L2, L3, main memory). Not to mention possible contention, false-sharing, etc. Source code differs very much from what's actually going on inside the CPU. – Tagir Valeev Sep 05 '15 at 08:22
  • 1
    @TagirValeev , yes? Have you looked at the code? The hash functions do not have branches that depend on input data. The only branches you see there are states and lengths, which are *constants* for purposes of algorithm. The rounds are fixed, so for same input data length, the same operations. As for things like "cache misses" those are irrelevant because those can be considered "random". But if you can predict 1 branch in your code vs. another reliably, which you can with (x==7?1:0) conditional, then the entire algorithm is compromised since you can differentiate on internal state. – user3427419 Sep 05 '15 at 08:27
  • @user3427419, cache misses are much less random than many compromised random generators. There are plenty of papers regarding cache miss timing attacks. Read, for example, [this one](http://cr.yp.to/antiforgery/cachetiming-20050414.pdf). – Tagir Valeev Sep 05 '15 at 08:33
  • You are right about memory access. Actually, table lookups depending on secret data must also be avoided. Branching and lookups are only allowed if they do not depend on secret data (e.g. a secret key). Maybe I should have mentioned this in the question; I wanted the question to be readable therefore I tried to keep it simple. But in fact it's good that you mention the cache problem. – Chris Sep 05 '15 at 08:46
  • @TagirValeev sure, but that doesn't mean you give up and give someone backdoor by adding branches. And yes, I know about some advanced side channels (including using power lines to determine CPU state and thus getting the key, or fan noise for same purposes), but those can be mitigated. But if you add branches, then logic is broken and game over. – user3427419 Sep 05 '15 at 08:53
  • @user3427419, well in this particular case you can warm-up your code with data which is uniform enough to produce non-branching (cmovne) code. If you still aware, write critical parts on C and use JNI. – Tagir Valeev Sep 05 '15 at 09:07
  • @TagirValeev If you really want to be sure you have to go down to the assembler level, since with C you have again the same problem: the C compiler can decide to use branchings. – Chris Sep 05 '15 at 09:26
6

OK, so I think that the reason you are asking this is that if the execution time of a crypto function depends on the inputs to the function, then an attacker can gain clues as to those inputs by measuring the execution time. (Hence, the normal "premature optimization" and "don't try to outsmart the compiler" advice don't really apply.)

In the light of that, here are my suggestions:

  • If x is a constant at compile time (or JIT compile time) then the chances are that the code will be optimized to either result = true; or result = false;

  • If x is not a constant, but there is a small range of possible values then one of the following approaches will probably work:

    // It is possible but unlikely that the JIT compiler will 
    // turn this into conditional code.
    private boolean[] LOOKUP = new boolean[] {
            true, true, true, true, true, true, true, false};
    ...
    result = LOOKUP[x];
    
    // This depends on how the JIT compiler translates this to native
    // code.
    switch (x) {
    case 0: case 1: case 2: case 3: case 4: case 5: case 6: 
        result = false;
    case 7:
        result = true;
    }
    

The problem is that in every possible approach I can think of, the JIT compiler could legally optimize non-branching code into branching code. If this is security critical, then you need to investigate the actual native code emitted for every platform that you need to certify.

The other approach is to:

  1. analyze the Java code algorithm,
  2. try to spot cases where conditional branching is likely,
  3. design test inputs to trigger those branching paths,
  4. measure execution time (on all target platforms) to see if there is a detectable difference across your set of test inputs.

Of course, the other thing to note is that this may be moot anyway; e.g. if result is then used in another part of the crypto function to decide with execution path to take.

And ...

The expression will be evaluated millions of times, so if it there is for instance a clever solution that uses only 2 operators instead of 3, this would make the overall computation faster.

If this is your real motivation ... then my advice is Don't Bother. This is premature optimization. Leave it to the JIT compiler.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 1
    Two nits: (1) the `true`s and `false`s should be 1 and 0, since `result` is an int (that's really the whole point of the question: how to convert the boolean to an int without branching. And (2) if `x` is a compile-time constant (as defined in JLS 15.28), then `x == 7` _must_ be treated as a compile-time constant, it's not just a chance. – yshavit Sep 05 '15 at 07:43
  • @yshavit, actually JVM rarely care about boolean type. Even in Java bytecode it's converted to int values 0 and 1. Thus `boolean test(int x) {return x==7;}` and `int test(int x) {return x==7?1:0;}` will result in the same bytecode in the method body. – Tagir Valeev Sep 05 '15 at 08:10
  • @TagirValeev Ok, but back in the _code_, it seems to me that OP wants `result` typed as an `int`, presumably so they could do various int-type things with it (adding, shifting, whatever). – yshavit Sep 05 '15 at 08:13
  • @yshavit - 1) I interpret the question as the OP *really* wanting a `boolean` result. 2) Yes I'm aware of that. – Stephen C Sep 05 '15 at 12:51
  • If what they really want is a boolean, wouldn't `x == 7` do that, simply and without branching? I'm not in front of a compiler, so I can't validate that the bytecode or JITed code would be branchless, but I assume it would be. – yshavit Sep 05 '15 at 13:27
3

Since the goal is to accomplish

if (x == 7)
    result = 1;
else
    result = 0;

in some sort of algebraic fashion without branching,

result = 1 >> (x^7);

But then this does not work because right shift is masked to only a few bits. So, what you can do is,

result = 1 >> Integer.bitCount(x^7);

but it's still masked for case of -6 (all bits set in case of -6 ^ 7), so,

int bc = Integer.bitCount(x^7);
return 1 >> (bc | (bc>>1));

So, how much slower is it than a branch conditional? Above solution using bitCount(), to compare entire range int range more than once,

user    0m5.948s

Using branching, (x == 7 ? 1 : 0),

user    0m2.104s

So it's not too bad considering you get constant time comparison that works for any value, 7 being just an example. Yes, Integer.bitCount() is constant time too.

user3427419
  • 1,769
  • 11
  • 15
  • The bitcount is a very interesting idea! I think that most processors have dedicated instructions for bitcount, so that this should probably not involve any branching. I'll try to find out how bitcount is translated to bytecode and to assembler. Many thanks! – Chris Sep 05 '15 at 09:14
  • @Chris The JDK doesn't use those dedicated instructions, it seems: [`Integer.bitCount`](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Integer.java#Integer.bitCount%28int%29). (That's from Java 6, cause that's what google gave me with the least looking :) but it's the same on the Java 8 I have locally.) – yshavit Sep 05 '15 at 09:26
  • @yshavit, as many commenters here you underestimate the JVM. The bitCount call is [intrinsified](http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/c771b7f43bbf/src/share/vm/opto/library_call.cpp#l1727) (replaced with direct assembly implementation). The JDK code is just for interpreter. – Tagir Valeev Sep 05 '15 at 09:38
  • @yshavit Ok, thanks! I am still thinking about this: `Integer.bitcount(.)` is first translated by javac to bytecode (for the class file), and then during execution the just-in-time compiler will (normally) transform the bytecode to CPU (assembler) instructions. But if there exist corresponding bytecode and CPU instructions, then this transformation should be straightforward. :) – Chris Sep 05 '15 at 09:45
  • Thanks for the info, @TagirValeev (though I could probably do without the barbs, but whatevs.) – yshavit Sep 05 '15 at 09:52
2

A ternary would be a good option here:

result = x == 7 ? 1 : 0;

This code assigns 1 to result if the expression x == 7 evaluates to true, and assigns 0 otherwise.

  • 2
    Your code is conditionally branching, similar to an if/then/else sequence. The idea is to _compute_ the result using operators on integers (like: + - & | ^ ...) so that there is no conditional branching. – Chris Sep 05 '15 at 01:10
  • For what you are trying to do, you want a conditional branch. The other code is just bad because it doesn't clearly show what you are doing. –  Sep 05 '15 at 01:56
  • Update: I don't know about cryptographic computing, your way might be better. I'll leave this for the experts to discuss. –  Sep 05 '15 at 01:57
  • @Chris, is the conditional branching so costly? I think its the constant time operation, right? – CodeYogi Sep 05 '15 at 02:52
  • 2
    @CodeYogi: I believe that the OP's reference to "constant time" is not about algorithmic complexity (O(1) as opposed to O(*n*) or whatnot), but rather, about avoiding susceptibility to [timing attacks](https://en.wikipedia.org/wiki/Timing_attack). The OP does seem to be concerned about performance as well, but I don't think that's his/her primary motivation for avoiding conditional branching. – ruakh Sep 05 '15 at 04:39
  • 1
    @ruakh, constant time and Java are just two separate worlds. It's very hard to predict when JVM will take some CPU time to JIT-compile some code or to run the garbage collection. Depending on branch profiling information it may produce quite different code for the same source. It may or may not be inlined into the caller method. The results might be completely different on the next JVM version (even including minor updates). – Tagir Valeev Sep 05 '15 at 07:37
2

Taking advantage of the extremely limited range of x, which is in [0,15], I propose using an in-register table lookup, using one bit per table entry. The table has bit i set for those inputs that should produce a 1, and zero otherwise. This means our table is the literal constant 27 = 128:

int x,result;
result = (128 >> x) & 1;
njuffa
  • 23,970
  • 4
  • 78
  • 130