1

I'm looking for AVX/AVX2 support in Java 11+ project. I found some Java 8 materials, but most of them is outdated. I'm know, it is possible to use AVX via JNI, but I'm wonder if it is possible to get rid (or minimize) of JNI overhead. I want to optimise some operations on matrices. I would like to perform the operations that will be executed without leaving this decision to the JIT.

I think about something like a manual AVX support, based on annotations or unsafe/incubator classes, which would be be provided by JVM implementation.

I found Project Panama, but there is very little information about this project.

Do you have any experiences or thoughts on this topic. Are there any other options?

Notes:

  1. It is possible to use AVX by JNI: https://stackoverflow.com/a/10809123/12292000
  2. Project Panama website: https://openjdk.java.net/projects/panama/
Jakub Biały
  • 391
  • 2
  • 16
  • What do you want to do? The JIT will generate AVX(2) instructions, if your CPU supports it. – JCWasmx86 Aug 05 '20 at 16:31
  • Any reliable source, that the JIT will generate AVX/AVX2 instructions? – Jakub Biały Aug 05 '20 at 16:34
  • I want to optimise operations on matrices. – Jakub Biały Aug 05 '20 at 16:36
  • 1
    I didn't found a good source, but here is code in the JVM, that emits AVX-Instructions. Because of this, you can follow, that AVX is used: https://github.com/openjdk/jdk/blob/aee74901f73bc0ec9bca31694c0714c7b84c6f5d/src/hotspot/cpu/x86/assembler_x86.cpp#L1779 – JCWasmx86 Aug 05 '20 at 16:38
  • 2
    Furthermore try `-XX:+PrintAssembly` – JCWasmx86 Aug 05 '20 at 16:39
  • Sure, but then I am left to the mercy of JIT. Especially when the code, which I want to optimize is called a small number of times. – Jakub Biały Aug 05 '20 at 16:53
  • 4
    @JCWasmx86: Some algorithms auto-vectorize easily, others are hard for even ahead-of-time compilers like GCC or clang to vectorize into good SIMD asm, but possible for humans using intrinsics. Auto-vectorization, especially from a JIT compiler, is not going to cover all cases, especially anything where interesting shuffles would be useful. (I'd only have any hope of a JIT doing a good job for pure vertical SIMD, like `a[i] += b[i] * c[i];` – Peter Cordes Aug 05 '20 at 17:08
  • 1
    First, before trying to optimize JNI overhead, have you already proved that this overhead is significant in your case? JNI isn't that slow. And there are [tricks](https://stackoverflow.com/questions/36298111/is-it-possible-to-use-sun-misc-unsafe-to-call-c-functions-without-jni) to reduce overhead even further. – apangin Aug 30 '20 at 14:12

0 Answers0