Why is (a*b != 0) faster than (a != 0 && b != 0) in Java?

Question

I'm writing some code in Java where, at some point, the flow of the program is determined by whether two int variables, "a" and "b", are non-zero (note: a and b are never negative, and never within integer overflow range).

I can evaluate it with

if (a != 0 && b != 0) { /* Some code */ }

Or alternatively

if (a*b != 0) { /* Some code */ }

Because I expect that piece of code to run millions of times per run, I was wondering which one would be faster. I did the experiment by comparing them on a huge randomly generated array, and I was also curious to see how the sparsity of the array (fraction of data = 0) would affect the results:

long time;
final int len = 50000000;
int arbitrary = 0;
int[][] nums = new int[2][len];

for (double fraction = 0 ; fraction <= 0.9 ; fraction += 0.0078125) {
    for(int i = 0 ; i < 2 ; i++) {
        for(int j = 0 ; j < len ; j++) {
            double random = Math.random();

            if(random < fraction) nums[i][j] = 0;
            else nums[i][j] = (int) (random*15 + 1);
        }
    }

    time = System.currentTimeMillis();

    for(int i = 0 ; i < len ; i++) {
        if( /*insert nums[0][i]*nums[1][i]!=0 or nums[0][i]!=0 && nums[1][i]!=0*/ ) arbitrary++;
    }
    System.out.println(System.currentTimeMillis() - time);
}

And the results show that if you expect "a" or "b" to be equal to 0 more than ~3% of the time, a*b != 0 is faster than a!=0 && b!=0:

I'm curious to know why. Could anyone shed some light? Is it the compiler or is it at the hardware level?

Edit: Out of curiosity... now that I learned about branch prediction, I was wondering what the analog comparison would show for a OR b is non-zero:

We do see the same effect of branch prediction as expected, interestingly the graph is somewhat flipped along the X-axis.

Update

1- I added !(a==0 || b==0) to the analysis to see what happens.

2- I also included a != 0 || b != 0, (a+b) != 0 and (a|b) != 0 out of curiosity, after learning about branch prediction. But they are not logically equivalent to the other expressions, because only a OR b needs to be non-zero to return true, so they are not meant to be compared for processing efficiency.

3- I also added the actual benchmark that I used for the analysis, which is just iterating an arbitrary int variable.

4- Some people were suggesting to include a != 0 & b != 0 as opposed to a != 0 && b != 0, with the prediction that it would behave more closely to a*b != 0 because we would remove the branch prediction effect. I didn't know that & could be used with boolean variables, I thought it was only used for binary operations with integers.

Note: In the context that I was considering all this, int overflow is not an issue, but that's definitely an important consideration in general contexts.

CPU: Intel Core i7-3610QM @ 2.3GHz

Java version: 1.8.0_45
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

What about `if (!(a == 0 || b == 0))`? Microbenchmarks are notoriously unreliable, this is unlikely to really be measurable (~3% sounds like a margin of error to me). — Elliott Frisch, Feb 21 '16 at 01:55
Try comparing `a*b != 0` with `!( a == 0 || b == 0)`. If those two results are much closer together, it's an optimization quirk. — Todd Knarr, Feb 21 '16 at 01:59
Branching is slow if the predicted branch is wrong. `a*b!=0` has one less branch — Erwin Bolwidt, Feb 21 '16 at 02:05
I added !(a == 0 || b == 0) to the analysis as you guys suggested, and it's exactly the same as a != 0 && b != 0. It seems that it really depends on the number of comparators, and that arithmetic operators and much quicker — Maljam, Feb 21 '16 at 02:18
I'm a bit surprised that a multiply is faster than a comparison even granted the effect of pipelining @StevenC pointed out. It would be interesting to see what the JIT compiler does by way of optimizing the two forms. It's very simple to peephole optimize `a*b==0` to `(a|b) == 0`, where optimizing the short-circuited comparison is more work. Regardless, you should try the `a|b` option, since this will be no more than one cycle (it could even be "free" if dispatched with another op), whereas multiply is usually 2 to 4 cycles in modern x86's. — Gene, Feb 21 '16 at 02:58
Something like 1 in 12 people have some form of color blindness. Graphs using solely color to distinguish points can be difficult or impossible for those people. The usual solution is to also make the points have different shapes. That said, it's a wonderful graph and a great question. — Wayne Conrad, Feb 21 '16 at 12:28
@Gene: Your proposed optimization is not valid. Even ignoring overflow, `a*b` is zero if _one_ of `a` and `b` is zero; `a|b` is zero only if both are. — hmakholm left over Monica, Feb 21 '16 at 12:32
Because the issue is probably branch prediction, do make sure you are testing with actual values from your program **in the order in which they appear in the program**. The taken/not-taken sequence for a given conditional branch controls how well branch prediction works. A correctly predicted branch is a fast operation. — Patricia Shanahan, Feb 21 '16 at 13:31
Have you seen [this answer](http://stackoverflow.com/a/513259/25429)? You could use [JMH](http://openjdk.java.net/projects/code-tools/jmh/) to write microbenchmark and get the low-level details including the CPU branch-prediction stats. — zloster, Feb 21 '16 at 14:07
@CodesInChaos Exactly. Of course there are very many examples like that. For example `983040 * 6422528 == 0` (as 32-bit integers) but neither factor is zero (as a 32-bit integer). If you see it as hex, with arbitrary precision my example is `0xF0000 * 0x620000 == 0x5BE00000000`, but since a 32-bit integer preserves only the 8 least significant hex digits, the product is `0x00000000`, or just zero. — Jeppe Stig Nielsen, Feb 21 '16 at 15:33
Premature optimization is the root of all evil. Stop doing meaningless microoptimizations and improve _algorithms_ instead. — user882813, Feb 21 '16 at 19:05
@user882813 Why? If you could reduce your program's execution time by over 25% with this trick, would you **not** do it? — user253751, Feb 21 '16 at 22:35
@immibis because the internet and peers have told him over and over again searching for and discovering such optimizations is called "premature optimization" and that "premature optimization is bad." That's why. Some guy somewhere said that, so it's universally true. Geeze, didn't you get the memo? — , Feb 22 '16 at 05:26
I tend to open a post "Why is using itneger 1 isntead a constant calculation resulting in 1 faste?".... — dhein, Feb 22 '16 at 08:18
To check if this is compiler or hardware you can turn off compiler optimization... — elyashiv, Feb 22 '16 at 09:14
@TechnikEmpire There are actually two guys. One that by definition yells that doing optimizations too soon in the development process is bad - this person should be banned from discussions. But there is also the other which is the good scientist: calling out the optimizations which are implemented prematurely, without first spending enough time verifying that it does not influence the actual logic. Don't snub his wisdom! — Gimby, Feb 22 '16 at 13:00
In a way it's the habit of using `&&` rather than `&` when the result is the same (no side-effects to consider) that's the premature optimisation. We tend to favour `&&` in C-style languages because it tends to be faster (do less work) but sometimes that's premature when it turns out that the added branch costs more than the omitted work. (In fairness, it is also more often the case that `&` is just wrong than that `&&` is wrong, but the favouring of `&&` goes back to a time in C when branch prediction wasn't as big an influence and the speed impact influenced the habits in later languages). — Jon Hanna, Feb 22 '16 at 16:01
If branch misprediction is indeed the problem (as I suspect that it is), then you should see even better performance from the non-branching, bitwise `&` operator: `if ((a != 0) & (b != 0))`. Not only will this avoid the overflow problem, but should be considerably faster, given how slow multiplication is. It and division are the only operations that require multiple cycles; bit-twiddling is extremely fast by comparison. — Cody Gray - on strike, Feb 23 '16 at 04:36
I am similarly surprised that bitwise and works on booleans, but Java is not my usual language. That said, I suspect one or both of my suggestions will perform similarly. — Pagefault, Feb 23 '16 at 07:09
You write that a and b are variables, but they are not, they are more complex expressions. As noted in @Pagefault's answer, this can very well make a difference. — Carsten S, Feb 23 '16 at 08:48
Sounds like most of my college career .... "Write code that does ..." .... Ooooh shiiiiny graphs. Hell who am I kidding I still do this. — Jack, Feb 23 '16 at 21:30
@AleksiTorhamo Thanks. Sorry. It was a long day. Wasn't thinking clearly. — Gene, Feb 24 '16 at 02:31
@CodyGray Nitpicking: When the operands to `&` are booleans, it is not a bitwise but a logical operator, see [JLS §15.22](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.22). — siegi, Feb 24 '16 at 05:08
@siegi Blargh! Sure enough. That's what I get as a C++ guy for commenting on a question that pops up in the "hot questions" section. :-) In fact, that may explain why the updated benchmark shows it not performing as well as I would have expected. Java's compiler or runtime environment may not be able to optimize this as efficiently as it would if it were a bitwise operation. It may still be compiling it to a branch. — Cody Gray - on strike, Feb 24 '16 at 07:14
@HenningMakholm: As I read it, he suggests `(a|b) != 0` as an optimisation for `a != 0 || b != 0` and `(a*b) != 0` for `a != 0 && b != 0` — the former is fine, the latter risks overflow on numbers >= 2¹⁶. — PJTraill, Feb 24 '16 at 11:07
@siegi: Surely JLS 15.22 does not specify lazy evaluation of `b` in `a&b`, which is what will determine whether a branch is used (and whether side-effects occur). But if you are only criticising “bitwise” you are right — unless we accept that a Boolean has one bit! — PJTraill, Feb 24 '16 at 11:13
@immibis Simply because you will not get 25% boost by just replacing "slow" primitive operation with "fast" (and, BTW, incorrect, as in this question) one. The old good days when you can dramatically increase computation speed by just replacing division by 2 with bitwise shift is gone. Forever. To improve performance nowadays you should, no, not should, you _must_ switch you brain on and improve algorithms. Or, at least, choose right one. I'm sad to say, but there is no more magic unicorns nor fairies in our world. Instead we have terabytes of "tricky optimized" low-quality code. — user882813, Feb 29 '16 at 21:20
@user882813 " you will not get 25% boost by just replacing "slow" primitive operation with "fast" one" - Then how come the asker has shown exactly that result? — user253751, Feb 29 '16 at 23:14
I just want to comment on 'premature optimization is the root of all evil'. First, there is no evidence that it's premature here. Second, it's a partial quotation. What Knuth actually said was 'We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.'. And there's a lot more in the same vein. Unfortunately it has become one of the most misquoted quotations in computing history. — user207421, Feb 29 '20 at 01:26
"if you expect "a" or "b" to be equal to 0 more than ~3% of the time" From your graph, it seems you mean `0.3` which is 30% not 3%... it might be exactly `1 - sqrt(0.5)` which is approximately 29.3%, and then the "OR" case is `sqrt(0.5)` — Ben Voigt, Apr 29 '22 at 21:55

Stephen C · Accepted Answer · 2016-02-21T03:59:51.067

249

I'm ignoring the issue that your benchmarking might be flawed, and taking the result at face value.

Is it the compiler or is it at the hardware level?

That latter, I think:

  if (a != 0 && b != 0)

will compile to 2 memory loads and two conditional branches

  if (a * b != 0)

will compile to 2 memory loads, a multiply and one conditional branch.

The multiply is likely to be faster than the second conditional branch if the hardware-level branch prediction is ineffective. As you increase the ratio ... the branch prediction is becoming less effective.

The reason that conditional branches are slower is that they cause the instruction execution pipeline to stall. Branch prediction is about avoiding the stall by predicting which way the branch is going to go and speculatively choosing the next instruction based on that. If the prediction fails, there is a delay while the instruction for the other direction is loaded.

(Note: the above explanation is oversimplified. For a more accurate explanation, you need to look at the literature provided by the CPU manufacturer for assembly language coders and compiler writers. The Wikipedia page on Branch Predictors is good background.)

However, there is one thing that you need to be careful about with this optimization. Are there any values where a * b != 0 will give the wrong answer? Consider cases where computing the product results in integer overflow.

UPDATE

Your graphs tend to confirm what I said.

There is also a "branch prediction" effect in the conditional branch a * b != 0 case, and this comes out in the graphs.
If you project the curves beyond 0.9 on the X-axis, it looks like 1) they will meet at about 1.0 and 2) the meeting point will be at roughly the same Y value as for X = 0.0.

UPDATE 2

I don't understand why the curves are different for the a + b != 0 and the a | b != 0 cases. There could be something clever in the branch predictors logic. Or it could indicate something else.

(Note that this kind of thing can be specific to a particular chip model number or even version. The results of your benchmarks could be different on other systems.)

However, they both have the advantage of working for all non-negative values of a and b.

edited Feb 21 '16 at 03:59

answered Feb 21 '16 at 02:09

Stephen C

698,415
94
811
1,216

Thanks for your answer, that's quite interesting. And to address your concern about benchmarking, I just iterated an arbitrary int variable ("arb++"). I omitted it because I thought that it didn't matter as long as I use the same one. – Maljam Feb 21 '16 at 02:25
@StephenC Please correct me if I am wrong. For `(a != 0 && b != 0)`, you'd have to load `a`, do a `beqz` with 0, store the value; same thing for `b`. So, we have 2 lw's, 2 sw's and 2 branches. For `(a * b != 0)`, you have two lw's for `a`, `b`; one sw for storing `a*b` and then a beqz for comparison with 0. So, we have 2 lw's, 1 sw and 1 branch. Do you think this reasoning is correct? Further, in the second case, there is only one available branch, so the branch predictor will always have a probability tending to 1. – Debosmit Ray Feb 21 '16 at 02:34
2

@DebosmitRay - 1) There should be no SW's. The intermediate results will be kept in a register. 2) In the second case, there are two available branches: one to execute "some code" and the other to skip to the next statement after the `if`. – Stephen C Feb 21 '16 at 02:43
2

@StephenC you are right to be confused about a+b and a|b, because the curves _are_ the same, I think it's the colors being really close. Apologies to color blind people! – Maljam Feb 21 '16 at 03:56
@Maljam - I meant - I'm puzzled that they are different from the `a * b != 0` case. I would not have expected that. Maybe there is a clue in the native code that the JIT compiler produces. Or maybe we >>are<< seeing a benchmarking artefact. – Stephen C Feb 21 '16 at 04:01
@StephenC the result is probably different because `a*b` and `a+b` are different. branch prediction cannot work the same on those. – njzk2 Feb 21 '16 at 04:09
3

@njzk2 from probability perspective those cases should be symmetric according to axis on 50% (probability of zero of `a&b` and `a|b`). They are, but not perfectly, that is the puzzle. – Antonín Lejsek Feb 21 '16 at 04:41
6

@StephenC The reason why `a*b != 0` and `a+b != 0` benchmark differently is because `a+b != 0` is not at all equivalent and should never have been benchmarked. For example, with `a = 1, b = 0`, the first expression evaluates to false but the second evaluates to true. The multiply acts sort of like an **and** operator, whereas the add acts sort of like an **or** operator. – JS1 Feb 21 '16 at 20:54
2

@AntonínLejsek I think the probabilities would differ. If you have `n` zeros then the likelihood of both `a` and `b` being zero increases with `n`. In an `AND` operation, with higher `n` the probability of one of them **being non-zero** increases and the condition is met. This is opposite for an `OR` operation (probability of either one of them **being zero** increases with `n`). This is based on a mathematical perspective. I am not sure if that's how the hardware works. – WYSIWYG Feb 23 '16 at 07:42
What could be difference in case 1 and case 3 in first graph? – Amber Beriwal Mar 08 '16 at 19:12
One thing is what the compiler emits at the byte code level, another what the machine code generated by HotSpot actually do. – Thorbjørn Ravn Andersen Feb 20 '19 at 23:26
Yup. See the first few lines of my answer. – Stephen C Feb 21 '19 at 00:42
`a|b != 0` would be faster if the compiler optimizes by short-circuiting. You don't need to even load the value for b if a != 0. – Drunken Code Monkey May 16 '23 at 06:52

Boann · Answer 2 · 2022-02-15T03:42:08.397

I think your benchmark has some flaws and might not be useful for inferring about real programs. Here are my thoughts:

(a|b)!=0 and (a+b)!=0 test if either value is non-zero, whereas a != 0 && b != 0 and (a*b)!=0 test if both are non-zero. So you are not comparing the timing of just the arithmetic: if the condition is true more often, it causes more executions of the if body, which takes more time too.
(a+b)!=0 will do the wrong thing for positive and negative values that sum to zero, so you can't use it in the general case, even if it works here. Also for a=b=0x80000000 (MIN_VALUE), the only set bit will overflow out the top.
Similarly, (a*b)!=0 will do the wrong thing for values that overflow. Random example: 196608 * 327680 is 0 because the true result happens to be divisible by 2³², so its low 32 bits are 0, and those bits are all you get if it's an int operation.
The VM will optimize the expression during the first few runs of the outer (fraction) loop, when fraction is 0, when the branches are almost never taken. The optimizer may do different things if you start fraction at 0.5.
Unless the VM is able to eliminate some of the array bounds checks here, there are four other branches in the expression just due to the bounds checks, and that's a complicating factor when trying to figure out what's happening at a low level. You might get different results if you split the two-dimensional array into two flat arrays, changing nums[0][i] and nums[1][i] to nums0[i] and nums1[i].
CPU branch predictors detect short patterns in the data, or runs of all branches being taken or not taken. Your randomly generated benchmark data is the worst-case scenario for a branch predictor. If real-world data has a predictable pattern, or it has long runs of all-zero and all-non-zero values, the branches could cost much less.
The particular code that is executed after the condition is met can affect the performance of evaluating the condition itself, because it affects things like whether or not the loop can be unrolled, which CPU registers are available, and if any of the fetched nums values need to be reused after evaluating the condition. Merely incrementing a counter in the benchmark is not a perfect placeholder for what real code would do.
System.currentTimeMillis() is on most systems not more accurate than +/- 10 ms. System.nanoTime() is usually more accurate.

There are lots of uncertainties, and it's always hard to say anything definite with these sorts of micro-optimizations because a trick that is faster on one VM or CPU can be slower on another. If running the 32-bit HotSpot JVM, rather than the 64-bit version, be aware that it comes in two flavors: with the "Client" VM having different (weaker) optimizations compared to the "Server" VM.

If you can disassemble the machine code generated by the VM, do that rather than trying to guess what it does!

I think it's still worth mentioning that `a|b != 0` doesn't have any of the problems of `a+b`. Regardless of the footnote about C++ compilers optimizing `a||b` into `a|b` [when it's safe](https://stackoverflow.com/questions/71039947/is-ifa-b-always-faster-than-ifa-b), e.g. NULL deref impossible. (https://godbolt.org/z/EKnYbo7En) That's something Java implementations *could* do; something to keep an eye out for in the future if not now. — Peter Cordes, Feb 15 '22 at 03:53

Pagefault · Answer 3 · 2016-02-23T07:11:12.187

25

The answers here are good, though I had an idea that might improve things.

Since the two branches and associated branch prediction are the likely culprit, we may be able to reduce the branching to a single branch without changing the logic at all.

bool aNotZero = (nums[0][i] != 0);
bool bNotZero = (nums[1][i] != 0);
if (aNotZero && bNotZero) { /* Some code */ }

It may also work to do

int a = nums[0][i];
int b = nums[1][i];
if (a != 0 && b != 0) { /* Some code */ }

The reason being, by the rules of short circuiting, if the first boolean is false, the second should not be evaluated. It has to perform an extra branch to avoid evaluating nums[1][i] if nums[0][i] was false. Now, you may not care that nums[1][i] gets evaluated, but the compiler can't be certain that it won't throw an out of range or null ref when you do. By reducing the if block to simple bools, the compiler may be smart enough to realize that evaluating the second boolean unnecessarily won't have negative side effects.

edited Feb 23 '16 at 07:11

answered Feb 22 '16 at 02:43

Pagefault

251
2
3

3

Upvoted although I have a feeling this doesn't *quite* answer the question. – Pierre Arlaud Feb 22 '16 at 13:26
4

That's a way to introduce a branch without changing the logic from non-branching (if the way you obtained `a` and `b` had side-effects you would have kept them). You still have `&&` so you still have a branch. – Jon Hanna Feb 22 '16 at 15:54
If the combined condition is more predictable than both separately, you should be avoiding short-circuit eval. e.g. doing `aZero = (nums[0][i] == 0)` etc. and `if (! (aZero | bZero) )` or something. Or I guess Java without implicit bool->int conversion would need `nums[...] == 0 ? 1 : 0`. That might require several asm instructions to actually implement, so would be slower than 2 branches if the first one (or both) predicted well. – Peter Cordes Feb 14 '22 at 18:50
Note that Java doesn't have a `bool` type; it's called `boolean`. – Peter Cordes Apr 30 '22 at 02:21

score 9 · Answer 4 · answered Feb 21 '16 at 02:30

When we take the multiplication, even if one number is 0, then the product is 0. While writing

    (a*b != 0)

It evaluates the result of the product thereby eliminating the first few occurrences of the iteration starting from 0. As a result the comparisons are less than that when the condition is

   (a != 0 && b != 0)

Where every element is compared with 0 and evaluated. Hence the time required is less. But I believe that the second condition might give you more accurate solution.

In second expression if `a` is zero then `b` needs not to be evaluated since whole expression is already false. So *every element is compared* is not true. — Kuba Wyrostek, Feb 21 '16 at 21:19

StackedCrooked · Answer 5 · 2016-02-24T16:44:55.170

9

You are using randomized input data which makes the branches unpredictable. In practice branches are often (~90%) predictable so in real code the branchful code is likely to be faster.

That said. I don't see how a*b != 0 can be faster than (a|b) != 0. Generally integer multiplication is more expensive than a bitwise OR. But things like this occasionally get weird. See for example the "Example 7: Hardware complexities" example from Gallery of Processor Cache Effects.

edited Feb 24 '16 at 16:44

answered Feb 24 '16 at 01:55

StackedCrooked

34,653
44
154
278

2

`&` is not a "bitwise OR" but (in this case) a "logical AND" because both operands are booleans and it's not `|` ;-) – siegi Feb 24 '16 at 05:19
1

@siegi TIL Java '&' is actually a logical AND without short-circuiting. – StackedCrooked Feb 24 '16 at 16:37
`&` isn't a logical AND, it's a bitwise AND. Same as in C. So `a & b` is *not* equivalent to `a && b`. But `a|b` *is* equivalent to `a || b`, just because of the difference in how "any bits set in either" vs. "no bits set" vs. "no intersecting bits" works. – Peter Cordes Feb 14 '22 at 18:52
@PeterCordes: In the special case "both operands are booleans" which siegi explicitly claimed, there is no difference between logical AND and bitwise AND. For booleans, it is correct to say that `&` is logical AND without short-circuit, and `&&` is logical AND with short-circuit behavior. – Ben Voigt Apr 29 '22 at 22:03
Yes, but the reply to siegi's correct comment omits mention of that special case when saying "today I learned (TIL)". It is interesting that since Java doesn't do implicit conversion between `boolean` and `int`, it's not just a "logical AND" in quotes, i.e. not a bitwise AND that happens to be used on 1-bit operands like in C++. In Java it's truly a logical AND when used on `boolean` operands, and you can't do `true & 123`. – Peter Cordes Apr 30 '22 at 02:26

score 2 · Answer 6 · answered Feb 14 '22 at 14:13

I know the question is old, but for curiosity and training I did some tests using JMH. The results were slightly different:

bit-wise OR (a | b != 0) and multiplication (a * b != 0) were the fastest;
normal AND (a!=0 & b!=0) was almost as good as multiplication
short-circuiting AND and OR (a!=0 && b!=0, !(a!=0 || b!=0) were slowest

Disclaimer: I am not even near an expert in JMH.

Here the code, I tried to reproduce the code posted in question, added bit-wise OR:

@Warmup(iterations = 5, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Fork(value = 3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class MultAnd {
    
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(MultAnd.class.getSimpleName())
            .build();
        
        new Runner(opt).run();
    }

    private static final int size = 50_000_000;
    
    @Param({"0.00", "0.10", "0.20", "0.30", "0.40", "0.45", 
            "0.50", "0.55", "0.60", "0.70", "0.80", "0.90", 
            "1.00"})
    private double fraction;

    private int[][] nums;

    @Setup
    public void setup() {
        nums = new int[2][size];
        for(int i = 0 ; i < 2 ; i++) {
            for(int j = 0 ; j < size ; j++) {
                double random = Math.random();
                if (random < fraction) 
                    nums[i][j] = 0;
                else 
                    nums[i][j] = (int) (random*15 + 1);
            }
        }
    }
    
    @Benchmark
    public int and() {
        int s = 0;
        for (int i = 0; i < size; i++) {
            if ((nums[0][i]!=0) & (nums[1][i]!=0))
                s++;
        }
        return s;
    }
    
    @Benchmark
    public int andAnd() {
        int s = 0;
        for (int i = 0; i < size; i++) {
            if ((nums[0][i]!=0) && (nums[1][i]!=0))
                s++;
        }
        return s;
    }
    
    @Benchmark
    public int bitOr() {
        int s = 0;
        for (int i = 0; i < size; i++) {
            if ((nums[0][i] | nums[1][i]) != 0)
                s++;
        }
        return s;
    }
    
    @Benchmark
    public int mult() {
        int s = 0;
        for (int i = 0; i < size; i++) {
            if (nums[0][i]*nums[1][i] != 0)
                s++;
        }
        return s;
    }
    
    @Benchmark
    public int notOrOr() {
        int s = 0;
        for (int i = 0; i < size; i++) {
            if (!((nums[0][i]!=0) || (nums[1][i]!=0)))
                s++;
        }
        return s;
    }
}

And the the results:

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark        (fraction)  Mode  Cnt    Score    Error  Units
MultAnd.and            0.00  avgt   30   33.238 ±  0.883  ms/op
MultAnd.and            0.10  avgt   30   48.011 ±  1.635  ms/op
MultAnd.and            0.20  avgt   30   48.284 ±  1.797  ms/op
MultAnd.and            0.30  avgt   30   47.969 ±  1.463  ms/op
MultAnd.and            0.40  avgt   30   48.999 ±  2.881  ms/op
MultAnd.and            0.45  avgt   30   47.804 ±  1.053  ms/op
MultAnd.and            0.50  avgt   30   48.332 ±  1.990  ms/op
MultAnd.and            0.55  avgt   30   47.457 ±  1.210  ms/op
MultAnd.and            0.60  avgt   30  127.530 ±  3.104  ms/op
MultAnd.and            0.70  avgt   30   92.630 ±  1.471  ms/op
MultAnd.and            0.80  avgt   30   69.458 ±  1.609  ms/op
MultAnd.and            0.90  avgt   30   55.421 ±  1.443  ms/op
MultAnd.and            1.00  avgt   30   30.672 ±  1.387  ms/op
MultAnd.andAnd         0.00  avgt   30   33.187 ±  0.978  ms/op
MultAnd.andAnd         0.10  avgt   30  110.943 ±  1.435  ms/op
MultAnd.andAnd         0.20  avgt   30  177.527 ±  2.353  ms/op
MultAnd.andAnd         0.30  avgt   30  226.247 ±  1.879  ms/op
MultAnd.andAnd         0.40  avgt   30  266.316 ± 18.647  ms/op
MultAnd.andAnd         0.45  avgt   30  258.120 ±  2.638  ms/op
MultAnd.andAnd         0.50  avgt   30  259.727 ±  3.532  ms/op
MultAnd.andAnd         0.55  avgt   30  248.706 ±  1.419  ms/op
MultAnd.andAnd         0.60  avgt   30  229.825 ±  1.256  ms/op
MultAnd.andAnd         0.70  avgt   30  177.911 ±  2.787  ms/op
MultAnd.andAnd         0.80  avgt   30  121.303 ±  2.724  ms/op
MultAnd.andAnd         0.90  avgt   30   67.914 ±  1.589  ms/op
MultAnd.andAnd         1.00  avgt   30   15.892 ±  0.349  ms/op
MultAnd.bitOr          0.00  avgt   30   33.271 ±  1.896  ms/op
MultAnd.bitOr          0.10  avgt   30   35.597 ±  1.536  ms/op
MultAnd.bitOr          0.20  avgt   30   42.366 ±  1.641  ms/op
MultAnd.bitOr          0.30  avgt   30   58.385 ±  0.778  ms/op
MultAnd.bitOr          0.40  avgt   30   85.567 ±  1.678  ms/op
MultAnd.bitOr          0.45  avgt   30   32.152 ±  1.345  ms/op
MultAnd.bitOr          0.50  avgt   30   32.190 ±  1.357  ms/op
MultAnd.bitOr          0.55  avgt   30   32.335 ±  1.384  ms/op
MultAnd.bitOr          0.60  avgt   30   31.910 ±  1.026  ms/op
MultAnd.bitOr          0.70  avgt   30   31.783 ±  0.807  ms/op
MultAnd.bitOr          0.80  avgt   30   31.671 ±  0.745  ms/op
MultAnd.bitOr          0.90  avgt   30   31.329 ±  0.708  ms/op
MultAnd.bitOr          1.00  avgt   30   30.530 ±  0.534  ms/op
MultAnd.mult           0.00  avgt   30   30.859 ±  0.735  ms/op
MultAnd.mult           0.10  avgt   30   33.933 ±  0.887  ms/op
MultAnd.mult           0.20  avgt   30   34.243 ±  0.917  ms/op
MultAnd.mult           0.30  avgt   30   33.825 ±  1.155  ms/op
MultAnd.mult           0.40  avgt   30   34.309 ±  1.334  ms/op
MultAnd.mult           0.45  avgt   30   33.709 ±  1.277  ms/op
MultAnd.mult           0.50  avgt   30   37.699 ±  4.029  ms/op
MultAnd.mult           0.55  avgt   30   36.374 ±  2.948  ms/op
MultAnd.mult           0.60  avgt   30  100.354 ±  1.553  ms/op
MultAnd.mult           0.70  avgt   30   69.570 ±  1.441  ms/op
MultAnd.mult           0.80  avgt   30   47.181 ±  1.567  ms/op
MultAnd.mult           0.90  avgt   30   33.552 ±  0.999  ms/op
MultAnd.mult           1.00  avgt   30   30.775 ±  0.548  ms/op
MultAnd.notOrOr        0.00  avgt   30   15.701 ±  0.254  ms/op
MultAnd.notOrOr        0.10  avgt   30   68.052 ±  1.433  ms/op
MultAnd.notOrOr        0.20  avgt   30  120.393 ±  2.299  ms/op
MultAnd.notOrOr        0.30  avgt   30  177.729 ±  2.438  ms/op
MultAnd.notOrOr        0.40  avgt   30  229.547 ±  1.859  ms/op
MultAnd.notOrOr        0.45  avgt   30  250.660 ±  4.810  ms/op
MultAnd.notOrOr        0.50  avgt   30  258.760 ±  2.190  ms/op
MultAnd.notOrOr        0.55  avgt   30  258.010 ±  1.018  ms/op
MultAnd.notOrOr        0.60  avgt   30  254.732 ±  2.076  ms/op
MultAnd.notOrOr        0.70  avgt   30  227.148 ±  2.040  ms/op
MultAnd.notOrOr        0.80  avgt   30  180.193 ±  4.659  ms/op
MultAnd.notOrOr        0.90  avgt   30  112.212 ±  3.111  ms/op
MultAnd.notOrOr        1.00  avgt   30   33.458 ±  0.952  ms/op

or as graph:

Why is (a*b != 0) faster than (a != 0 && b != 0) in Java?

Update

6 Answers6

Linked