0

I have the following pieces of code:

long start = System.currentTimeMillis();  
for(int i = 0; i < keys.length; ++i) {          
    obj.getElement(keys[i]);
}
long total = System.currentTimeMillis() - start;  
System.out.println(total/1000d + " seconds");  

And the following:

long start = System.currentTimeMillis();  
for(int i = 0; i < keys.length; ++i) {          
    obj.hasElement(keys[i]);
}
long total = System.currentTimeMillis() - start;  
System.out.println(total/1000d + " seconds");  

The implementations of these methods are:

public T getElement(int key) {  
   int idx = findIndexOfElement(key);  
   return idx >= 0? ITEMS[idx]:null;
}  

public boolean hasElement(int key) {
    return findIndexOfElement(key) >= 0;
}

Pretty straightforward. The only difference between the 2 methods is the conditional access to the table.

Problem: When actually measuring the performance of these snippets the getElement takes twice the time than the hasElement.
So for a series of tests I get ~2.5seconds for the first loop of getElement and ~0.8 secs for the second loop of hasElement.
How is it possible to have such a big difference? I understand that the conditional statement is a branch and jump but still seems to me too big.
Is there a way to improve this?

Update:
The way I measure is:

long min = Long.MAX_VALUE;  
long max = Long.MIN_VALUE;  
long run = 0;  
for(int i = 0; i < 10; ++i) {  
   long start = System.currentTimeMillis();  
    for(int i = 0; i < keys.length; ++i) {          
        obj.getElement(keys[i]);
    }
    long total = System.currentTimeMillis() - start;  
    System.out.println(total/1000d + " seconds");    
    if(total < min) {
        min = time;
    }
    if(total > max) {
        max = time;
    }
    run += time;
    for(int i = 0; i < 50; ++i) {
        System.gc();            
            try {
                Thread.sleep(1000);
        } catch (InterruptedException e) {              
            e.printStackTrace();
        }           
    }   
}
System.out.println("min=" + min + " max=" + max);  
System.out.println("avg = " +  (double)run/1000/keys.length);
Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • 1
    How are you running your benchmark on these pieces of code? – Luiggi Mendoza Jul 18 '15 at 20:23
  • @LuiggiMendoza:The times I mention are the ones printed in the snippets. And I have an outer loop that calls these snippets 10 times so that I get an average – Cratylus Jul 18 '15 at 20:24
  • So you're not doing a proper benchmark... Your results are useless. – Luiggi Mendoza Jul 18 '15 at 20:25
  • @LuiggiMendoza:What's the problem? – Cratylus Jul 18 '15 at 20:26
  • 2
    Please read through [How do I write a correct micro-benchmark in Java?](http://stackoverflow.com/q/504103/1065197) to have a better understanding of this subject. – Luiggi Mendoza Jul 18 '15 at 20:27
  • @LuiggiMendoza:Please see update also – Cratylus Jul 18 '15 at 20:34
  • Bad way to measure results. Please for future times, use one of the frameworks stated in the accepted answer of that question. – Luiggi Mendoza Jul 18 '15 at 20:37
  • By the way, you should understand that running one case after another is wrong since JIT has already kick-in and optimized the code, so it's common that the second case is *faster* than the first (unless the second case has extremely low performance). – Luiggi Mendoza Jul 18 '15 at 20:39
  • Is it possible that your `hasElement()` method is optimised away after a while? Seeing as you don't do anything with the result. – biziclop Jul 18 '15 at 20:39
  • @biziclop:It should not because the arguments are random – Cratylus Jul 18 '15 at 20:41
  • @LuiggiMendoza:I thought of something like that and already ran the second snippet only and I got the same results (i.e. faster than the other) – Cratylus Jul 18 '15 at 20:42
  • @Cratylus Probably, though in absence of a correct benchmark that's just a guess. – biziclop Jul 18 '15 at 20:45
  • @LuiggiMendoza:How is Rule 1 in your post supposed to be done? `Always include a warmup phase which runs your test kernel all the way through,`? – Cratylus Jul 18 '15 at 20:56
  • Again: use one of the recommended frameworks for the benchmark. – Luiggi Mendoza Jul 18 '15 at 20:57
  • @LuiggiMendoza:I am checking out caliper (with problems).JMH to be honest doesn't do something more sophisticated that what I am doing. Just loops – Cratylus Jul 19 '15 at 11:32
  • 1
    Your benchmark is incorrect at least because the results of `getElement` and `hasElement` are not used, so JIT compiler is free to optimize the call out. Anyway there is no surprise that `getElement` works slower - it has twice more conditional branches than `hasElement`, and conditional branches here are the most expensive part of the benchmark. The first one is `idx >= 0` check. The other one is array bounds check. – apangin Jul 20 '15 at 15:57
  • @apangin:1) How does the JIT compiler know that the calls don't have any side-effects and can be optimized out? I don't think just because the return value is not used that can be a valid assumption 2) The branching point you make is fair. Is there a way to improve? – Cratylus Jul 20 '15 at 17:44
  • @apangin:You know your comment got me thinking. If you are right then how would I make sure the code is not optimized like that in a way that I don't add overhead in the measurement? – Cratylus Jul 20 '15 at 20:21
  • Use JMH - it'll take care of dead code elimination and other stuff. E.g. `blackhole.consume(obj.getElement(keys[i]));` Here is [a good article](http://java-performance.info/jmh/) to start with. – apangin Jul 20 '15 at 22:05
  • I doubt that one extra array access is a bottleneck in the real code. Please show us the full benchmark so that we could reproduce your results. – apangin Jul 20 '15 at 22:18
  • Your question lacks an actual problem. If your code incorporates random array access, that might take more time than code not performing an array access at all, but why do you expect to be able to optimize the code bearing the access to be as fast as the code without it? If you have an actual performance problem to solve, you should focus on the operation being the bottleneck, not on comparisons between that operation and other, completely unrelated operations. – Holger Jul 21 '15 at 09:56
  • @Holger:But these methods access the exact same item in the array. It is just one finds its index and the other actually returns the contents of that address. But both methods touch the same page that the slot of the array is located – Cratylus Jul 21 '15 at 17:33
  • `hasElement` does *not* touch the array. What happens inside `findIndexOfElement` is irrelevant as both methods execute it (and you didn’t post its code anyway). Besides calling `findIndexOfElement`, one method reads the array entry and the other does not. That’s an obvious difference. – Holger Jul 22 '15 at 08:04
  • @Holger:It does access the array. It checks to see if the passed `key` is in one of the elements of the array. Basically it ends up `if(ITEMS[i] == key) return i;` – Cratylus Jul 22 '15 at 17:19
  • *Again*, what happens inside `findIndexOfElement` does not matter as *both* are executing that method. The only thing that matters is the *difference*. And the difference is, inside the method itself, *not inside `findIndexOfElement`*, one method has an access and the other hasn’t. And we are talking about the code *as you have posted*, not about code you don’t have posted. – Holger Jul 22 '15 at 17:28
  • If that makes you feel better, we can say, that one method has *one more* array access. But the array access during a linear search is predictable and optimizable wheres as the one additional random index access is not optimizable. – Holger Jul 22 '15 at 17:31
  • @Holger:But if this line is executed `if(ITEMS[i] == key) return i;` this means that that part of the table is in cache right? Why the additional index access is random as soon as we return back to the caller? – Cratylus Jul 22 '15 at 17:33
  • Your code makes not clear where `ITEMS` comes from but obviously, it’s not passed to `findIndexOfElement`, so that method has to read that field itself, so there is no guaranty that it will read the same array. Still, the optimizer may coerce the reads, if there is no in-between synchronization action. Then it has to ensure that `idx` has always been used in a previous array access before optimizing, but obviously, you return a negative value if no match is found, so that must be precluded. The bottom line is, that optimization is indeed possible, but it’s not surprising if it doesn’t happen. – Holger Jul 22 '15 at 17:44
  • @Holger:Very interesting comment actually. Both methods are instance methods of the same object and the array is a member variable of that object – Cratylus Jul 22 '15 at 17:51
  • @apangin:I have updated OP with the approach I use to benchmark. The snippet is wrapped in a separate function and called several times. – Cratylus Jul 22 '15 at 21:05
  • As I've told earlier your benchmark approach is not necessarily correct. Also it is still unclear what the problem is. You have two methods: one does more than the other and should obviously work longer. What otherwise do you expect? I cannot reproduce the results you claim (2.5s vs. 0.8s) because I have no idea what `findIndexOfElement` is and what `keys` and `ITEMS` contain. – apangin Jul 22 '15 at 21:51
  • There are techniques that help answer why something works slower than something else. E.g. JMH, PrintAssembly, perf etc. But again, since there is no self-contained reproducible test, I can only suggest you use the proper instruments yourself. – apangin Jul 22 '15 at 22:06
  • @apangin:If you have time you can check my other relevant question? It shows my actual problem and how I tried to benchmark – Cratylus Jul 22 '15 at 22:17

2 Answers2

0

Is ITEMS definitely an array, and implemented as an array? If it is somehow implemented as a linked list, that would cause O(n) time instead of O(1) time on the get.

OrigamiMarie
  • 187
  • 3
0

Your branches are probably the limiting factor in the short code posted. In the getElement method there is one branch and in the hasElement method there is another one plus it calls the getElement method, making it two branches for that method.

So in summary, the number of branches are double in that method and it seems very reasonable that the runtime also is double.

vidstige
  • 12,492
  • 9
  • 66
  • 110