2

When I read about optimization, I covered topic about loop unfolding. By doing some small search on Google, I didnt found if Java's compiler do this or not.

So the best way was to try if by my self.

Actually i was quite suprised of fact, that actually by doing this loop unfolding, i managed to speed it up, since I was quite sure modern compilers do this for me.

public static void folded() {
    System.out.println("Folded:");
    long c1 = System.currentTimeMillis();

    for (int r = 0; r < 10; r++) {
        for (int i = 0; i < 500000; i++) {
            Math.sin(i);
        }
    }
    System.out.println(System.currentTimeMillis() - c1);
}

public static void unFolded() {
    System.out.println("Unfolded:");
    long c1 = System.currentTimeMillis();

    for (int r = 0; r < 10; r++) {
        for (int i = 0; i < 500000; i += 10) {
            Math.sin(i);
            Math.sin(i + 1);
            Math.sin(i + 2);
            Math.sin(i + 3);
            Math.sin(i + 4);
            Math.sin(i + 5);
            Math.sin(i + 6);
            Math.sin(i + 7);
            Math.sin(i + 8);
            Math.sin(i + 9);
        }
    }
    System.out.println(System.currentTimeMillis() - c1);
}

RESULT(COUNTER 500'000):

Folded:453

Unfolded:114

RESULT(COUNTER 5'000'000):

Folded: 13850

Unfolded: 11929

So what should i trust? Manual optimization or compilers? Since in this test, my result shows that manual optimization seems to be better.

arccuks
  • 173
  • 2
  • 12
  • 2
    I'm voting to close this question as off-topic because it is a code optimization question that might better belong at http://codereview.stackexchange.com/ – Freiheit Nov 10 '16 at 14:42
  • If you really want to do these kind of tests you have to look into the instructions via `javap` – Murat Karagöz Nov 10 '16 at 14:45
  • 2
    Consider reading http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java . – bradimus Nov 10 '16 at 14:46
  • @Freiheit Code optomisation questions are not off topic on SO.. – assylias Nov 10 '16 at 14:50
  • 1
    Yes the JIT will perform loop unrolling, but (a) it will only do so after the loop has been run a certain number of time (I don't think your warmup is long enough) and (b) your measurement method is flawed in may ways (cf. link above), for example because the JIT may not run the loop at all because it has no side effects. – assylias Nov 10 '16 at 14:51
  • 1
    The warmup issue probably explains why you get better results as the number of loops increases. – assylias Nov 10 '16 at 14:53
  • This question is mostly nonsense. The `sin` operation is much more expensive than a single _predictable_ branch instruction. Loop unrolling makes sense only for tight loops with dirt-cheap body, like integer summing or counting. Any result pointing to the contrary is due to poor microbenchmarking discipline. – Marko Topolnik Nov 10 '16 at 15:12
  • @MarkoTopolnik But I still get time difference unFolded version runs 1-2 sec faster. – arccuks Nov 10 '16 at 15:15
  • 1
    Yes, your microbenchmark is indeed severely flawed. Try repeating this by properly using JMH, the dedicated Java microbenchmarking framework. – Marko Topolnik Nov 10 '16 at 15:16
  • @MarkoTopolnik I know this is not correct way to measure time, but if difference is so big as 2 sec, i can tell it even with chronometer. – arccuks Nov 10 '16 at 15:17
  • You have no idea in how many ways a microbenchmark can be flawed. Please read the suggested links and learn to use JMH. – Marko Topolnik Nov 10 '16 at 15:19
  • @MarkoTopolnik Yup, actually as people stated before, as i made warmup phase, actually both timings become almost identical, althought unfolded version still shows like 8ms better score, but i guess for this to get tested, i rly need make real test, not just play around like this. – arccuks Nov 10 '16 at 15:24
  • 4
    Your code on JMH: `regular 59.482 ± 3.248 ns/op; unrolled 62.548 ± 2.797 ns/op` – Marko Topolnik Nov 10 '16 at 15:26
  • @assylias - You're right. I have retracted my vote. OP provided enough details and specifics. My mistake! http://meta.stackoverflow.com/questions/286557/is-it-okay-to-ask-code-optimization-help – Freiheit Nov 10 '16 at 19:34

1 Answers1

0

Unfolder loop could be useful when you can parallelize unfolded operations. For that a lot of modern CPUs support vector instructions https://en.wikipedia.org/wiki/Vector_processor

Beginning from 7u40 server compiler for Java supports basic vector instructions http://bugs.java.com/view_bug.do?bug_id=6340864. Like arrayA[0..n] + arrayB[0..n] etc. Read more about Do any JVM's JIT compilers generate code that uses vectorized floating point instructions?

In your case unfolded operation is Math.sin(...) which is more that one CPU instruction. As result Java is not able to convert it to any known CPU vector instruction and provide performance benefit compare to loop.

Community
  • 1
  • 1
terma
  • 1,199
  • 1
  • 8
  • 15
  • 2
    This answer is self-contradictory. It explains that the JIT compiler may unroll the loop to improve performance, which directly defeats OP's claim that unrolling the loop at the Java level could have any effect. Yet it fails to point that out, instead making the point that `sin` cannot be vectorized. By implication, it asserts a falsehood: "If `sin` _could_ be vectorized, then OP's claim would be correct". – Marko Topolnik Nov 11 '16 at 06:36