2

I'm trying to understand when JDK will autovectorize. I have the following set of questions (despite googling, reading, experimenting etc.). Given a simple loop as follows:

for(int i=0; size = size(); i < size; i++) {
   a[i] = b[i] * c[i];
   method1();
   // someObject.method2();
   // someHashMap.put(b[i], c[i]);
}
  1. Why is it necessary for the method call "method1" (that appears within the loop) to be inlined for autovectorization to ocurr? (I can't understand why that must be necessary....)
  2. Perhaps this a "silly" question, but what if "someObject.method2()" were uncommented. (And let's assume that method2 is huge method, ie many lines). Would that prevent autovectorization too? What if method2 were a tiny method (eg just 1 or 2 lines etc.?)
  3. What if the "someHashMap" line were uncommented? Would the fact that we have an object/variable that would be shared accross all the SIMD cause the autovectorization to fail too? (I can't see how it could work unless jdk would somehow insert a "syncronization" keyword automatically when accessing the common object/var of "someHashMap"
  4. It seems to me that the "streaming" interface would solve the problem implied in question #3 directly above, since the "collector" logic in streams would automatically take care of merging individual hashmaps and so we would not need any "synchronized" word. (And in general, it almost seems like the streaming API is a perfect API to allow jdk to automatically use autovectorization, so long as there are no "outside vars" (ie no side effects) when creating the streaming code...Does jdk/jit compiler automatically do autovectorization as a result when the code is written using the standard streaming interface? If not, wouldn't it make sense to do so (perhaps in a future jdk version or perhaps a jdk from some other vendor?)
  5. If the body of the loop contains many many if statements etc (lots of branching and let's say further that each branch does lots of computation), would that mean that a) autovectorization is probably a BAD idea (just as it would be for a GPU) and b) the jit compiler is smart enough to determine that autovectorization is a bad idea and so it won't autovectorize?
  6. I am currently using Oracle jdk8, but do the answers change above if one uses jdk9 or jdk10, etc.?
Jonathan Sylvester
  • 1,275
  • 10
  • 23
  • 2
    Please take a look at [this](https://stackoverflow.com/q/10784951/5223047) question. It depends on the Java version and it's JIT compiler. Another good explanation [here](http://prestodb.rocks/code/simd/). – ltlBeBoy Sep 03 '18 at 14:22
  • If `method1()` could modify a[], b[], or c[], then obviously it has to inline. If escape analysis can prove the arrays are "private", then sure I guess you could auto-vectorize and call the function 4 times for each SIMD vector, if your compiler was smart enough to do that. Auto-vectorizing in a JIT compiler is already tough (because it has to compile fast), though, compared to an ahead-of-time C compiler. – Peter Cordes Sep 03 '18 at 17:28
  • 1
    @Peter - I wouldn't say it is "obvious" - in fact it's more a limitation of the current Java compiler: it's doesn't do IPA except indirectly through inlining. Certainly a more sophisticated compiler could analyze the function for various attributes such as "pureness" and use those to optimize call sites even without inlining. Some compilers (eg GCC) do, and in principle it's easier in Java since you don't have this whole interposition problem. – BeeOnRope Sep 03 '18 at 22:34

1 Answers1

2

To answer your question (1), in principle, a Java compiler could optimize in the presence of a non-inlined method1() call, if it analyzed method1() and determined that it doesn't have any side-effects that would affect the auto-vectorization. In particular, the compiler could prove that the method was "const" (no side effects and no reads from global memory) which in general would enable many optimizations at the call site without inlining. It could also perhaps prove more restricted properties, such as not reading or writing to arrays of a certain type, which would also be enough to allow auto-vectorization to proceed in this case.

In practice, however, I am not aware of any Java compiler that can do this optimization today. If this answer is to believed, in Hotspot: "a [not-inlined] method call is typically opaque for JIT compiler." Most Java compilers are based in one way or another on Hotspot, so I don't expect there is a sophisticated Java compiler out that that can do this if Hotspot can't.

This answer also covers some reasons why such a interprocedural analysis (IPA) is likely to be both difficult and not particularly useful. In particular, methods about which non-trivial things can be proven are often small enough that they'd inlined anyways. I'm not sure if I totally agree: one could also argue that Java inlines aggressively partly because it doesn't do IPA, so strong IPA would perhaps open up the ability to do less inlining and consequently reduce runtime code footprint and JIT times.

The other method variants you ask about in (2) or (3) don't change anything: the compiler would still need IPA do allow it to vectorize, and as far as I know Java compilers don't have it.

(4) and (5) seem like they should be asked as totally separate questions.

About (6) I don't think it has changed, but it would make a good question for the OpenJDK hotspot mailing lists: I think you'd get a good answer.

Finally, it's worth noting that even in the absence of IPA and knowing nothing about method1(), a compiler could optimize the math on a, b and c if it could prove none of them had escaped. This seems pretty useless in general though: it would mean that all those variables would have been allocated in this function (or some function inlined into this one), whereas I would imagine that in most realistic scenarios at least one of the three is passed in by the caller.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 1
    There used to be [`gcj`, an ahead-of-time Java compiler front-end for gcc](https://en.wikipedia.org/wiki/GNU_Compiler_for_Java). I'd assume it could do gcc's full range of inter-procedural optimizations. But it's abandoned, last release with gcc6.4 in 2017, and was in maintenance-only mode for a while before that, so no longer counts as a modern Java compiler. – Peter Cordes Sep 03 '18 at 23:43