Multiplying is faster than branching

Question

To get an idea of if-statement vs selective-multiplication, I tried the code below and saw that multiplying the result by 0 instead of failed-if-statement(false) and multiplying by 1 instead of passed-if-statement(true), if-statement is slower and just computing always is faster if there are only 3-4 double precision multiplications.

Question: While this multiplication is faster even on cpu, how would it perform on a GPU(opencl/cuda) ? My vote is for absolute speedup. What about precision loss for single precision multiplication? I know there cant be 1.00000 always, it is 0.999999 to multiply. Lets say I dont mind sp precision loss at 5th digit.

This is more suitable for integers but could this be meaningful for at least floats? If float/half are multiplied quicker/faster than doubles, then this would be even more faster.

Result:

 no if: 0.058515741 seconds
 if(){}: 0.073415743 seconds

Can anyone reproduce similar result? if(){} is the second test so JIT couldnt be cheating?

Code:

 public static void main(String[] args)
{
       boolean[]ifBool=new boolean[10000000];
       byte[]ifThen=new byte[10000000];
       double []data=new double[10000000];
       double []data1=new double[10000000];
       double []data2=new double[10000000];

       for(int i=0;i<ifThen.length;i++)
       {
          ifThen[i]=(byte)(0.43+Math.random()); //1 =yes result add, 0= no result add 
          ifBool[i]=(ifThen[i]==1?true:false);
          data[i]=Math.random();
          data1[i]=Math.random();
          data2[i]=Math.random();
      }

         long ref=0,end=0;
         ref=System.nanoTime();
         for(int i=0;i<data.length;i++)
         {
                // multiplying by zero means no change in data
                // multiplying by one means a change in data
            double check=(double)ifThen[i]; // some precision error 0.99999 ?
            data2[i]+=(data[i]*data1[i])*check; // double checked to be sure
            data[i]+=(data2[i]*data1[i])*check; // about adding the result
            data1[i]+=(data[i]*data2[i])*check; // or not adding
                                       //(adding the result or adding a zero)

         }
         end=System.nanoTime();
         System.out.println("no if: "+(end-ref)/1000000000.0+" seconds");

         ref=System.nanoTime();
         for(int i=0;i<data.length;i++)
         {
            if(ifBool[i]) // conventional approach, easy to read
            {
               data2[i]+=data[i]*data1[i];
               data[i]+=data2[i]*data1[i];
               data1[i]+=data[i]*data2[i];
            }
         }
         end=System.nanoTime();
         System.out.println("if(){}: "+(end-ref)/1000000000.0+" seconds");
}

CPU is FX8150 @ 4GHz

On pipelined cpus a branch can be very expensive, and mulitplies tend to be highly optimised. So I am not surprised much by this. — BevynQ, Jul 04 '13 at 22:59
Any compiler/interpreter can do this automatically for integer multiplication(and adding afterwards)? — huseyin tugrul buyukisik, Jul 04 '13 at 23:01
A couple of comments on this: 1. You should increase the iterations / time for it to be a meaningful test (perhaps add an outside loop to your loops to loop through the data multiple times) 2. You are changing the values of data in test 1, and then using the new values in test 2. Ideally you should use the exact same data for both tests (although I would not expect it to influence the test a great deal). — Trevor Freeman, Jul 04 '13 at 23:30
Just did what you said and same result. Even exchanging the loops positions did not change result. Repeated loops gave the also. — huseyin tugrul buyukisik, Jul 04 '13 at 23:36
Java micro-benchmarks such as this are _extremely_ difficult to do correctly. I suggest you read [this SO question and the accepted answer](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) and reconsider how you're doing things. — Jim Garrison, Jul 05 '13 at 22:25

score 3 · Answer 1 · answered Feb 04 '14 at 13:42

Cannot reproduce your results (CPU only).

Original code: no if: 0.11589088 seconds. if(){}: 0.115732277 seconds.

In reverse order: if(){}: 0.1154809 seconds. no if: 0.115531714 seconds.

Multiple runs produced different results, but if/no_if blocks were practically at parity.

You need a more elaborate benchmark to get to somewhat meaningful conclusions. Use warm up, stable random seeds, average over a lot of calls.

I's also probably (almost) useless to micromanage java code. It will only work on a specific hardware and a specific VM version. The VM code optimization is so advanced these days you won't believe what it can do. Be sure the executed code will be very different from your bytecode.

Multiplying is faster than branching

1 Answers1