I'm researching methods for computing expensive vector operations in Java, e.g. dot-products or multiplications between large matrices. There are a few good threads on here on this topic, like this and this.
It appears that there is no reliable way of having the JIT compile code to use CPU vector instructions (SSE2, AVX, MMX...). Moreover, high-performance linear algebra libraries (ND4J, jblas, ...) do in fact make JNI calls to BLAS/LAPACK libraries for the core routines. And I understand BLAS/LAPACK packages to be the de facto standard choices for native linear algebra computations.
On the other hand others (JAMA, ...) implement algorithms in pure Java without native
calls.
My questions are:
- What are the best practices here?
- Is making
native
calls to BLAS/LAPACK actually a recommended choice? Are there other libraries worth considering? - Is the overhead of JNI calls negligible compared to the performance gain? Does anyone have experience as to where the threshold lies (e.g. how small an input should be to make JNI calls more expensive than a pure Java routine?)
- How big is the portability tradeoff?
I hope this question could be of help both for those who develop their own computation routines, and for those who just want to make an educated choice between different implementations.
Insights are appreciated!