1

As part of one of my CS classes, I have to write a matrix class in Java, with some methods implemented in Java as well as C++ via the Java Native Interface and measure the difference in execution time.

Writing and debugging both versions was simple enough and after about 3 hours spent mostly googling how to get the interface to choose, I wound up with this following code:

Matrix.java:

public class Matrix {

    private double[] data;
    private int width, height;

    public Matrix(int h, int w) {
        width = w;
        height = h;
        data = new double[w * h];
    }

    public static void main(String[] args) {
        /*  takes 3 parametres u, v and w, creates two matrices m1 and m2, dimensions u*v and v*w
         *  fills them with random doubles, multiplies m1 * m2 with both methods
         *  reports time elapsed and checks equality of result */
    }

    public Matrix multiply(Matrix mat)       { return multiply(mat, false); }
    public Matrix multiplyNative(Matrix mat) { return multiply(mat, true);  }

    public Matrix multiply(Matrix mat, boolean natively) {
        int u, v, w;
        u = this.height;
        w = mat.width;
        Matrix res = new Matrix(u, w);
        if(this.width == mat.height) v = this.width;
        else return res;
        if(natively) multiplyC(this.data, mat.data, res.data, u, v, w);
        else {
            for(int i=0; i<u; i++) {
                for(int j=0; j<w; j++) {
                    double elem = 0.0;
                    for(int k=0; k<v; k++) {
                        elem += this.data[i*v+k] * mat.data[k*w+j];
                    }
                    res.data[i*w+j] = elem;
                }
            }
        }
        return res;
    }

    public static native void multiplyC(double[] a, double[] b, double[] r, int i, int j, int k);

    // SNIP: equals and random-prefill methods

    static {
        System.loadLibrary("Matrix");
    }
}

Matrix.cpp:

#include "Matrix.h"

JNIEXPORT void JNICALL Java_Matrix_multiplyC(JNIEnv *env, jclass,
                jdoubleArray a, jdoubleArray b, jdoubleArray res,
                jint u, jint v, jint w) {

    jdouble* mat1 = env->GetDoubleArrayElements(a, 0);
    jdouble* mat2 = env->GetDoubleArrayElements(b, 0);
    jdouble* mat_res = env->GetDoubleArrayElements(res, 0);

    for(int i=0; i<u; i++) {
        for(int j=0; j<w; j++) {
            jdouble elem = 0.0;
            for(int k=0; k<v; k++) {
                elem += mat1[i*v+k] * mat2[k*w+j];
            }
            mat_res[i*w+j] = elem;
        }
    }

    env->ReleaseDoubleArrayElements(a, mat1, 0);
    env->ReleaseDoubleArrayElements(b, mat2, 0);
    env->ReleaseDoubleArrayElements(res, mat_res, 0);
}

However for some reason, the Java implementation is as fast or faster for most input sizes, which is definitely not the expected result after talking to some classmates.

Here is some sample output data for different matrix sizes, taken from my Debian virtual box:

axim@hackbox:~/Desktop/prcpp/jni$ java -Djava.library.path=. Matrix 5 12 8
time taken in Java: 11452ns
time taken in C++:  20990ns
results equal:      true
axim@hackbox:~/Desktop/prcpp/jni$ java -Djava.library.path=. Matrix 20 48 32
time taken in Java: 5439887ns
time taken in C++:  5492423ns
results equal:      true
axim@hackbox:~/Desktop/prcpp/jni$ java -Djava.library.path=. Matrix 80 192 128
time taken in Java: 19726130ns
time taken in C++:  25375681ns
results equal:      true
axim@hackbox:~/Desktop/prcpp/jni$ java -Djava.library.path=. Matrix 320 768 512
time taken in Java: 194357345ns
time taken in C++:  384648461ns
results equal:      true
axim@hackbox:~/Desktop/prcpp/jni$ java -Djava.library.path=. Matrix 1280 3072 2048
time taken in Java: 58514495266ns
time taken in C++:  116695035710ns
results equal:      true

As you can see the time it takes for the native version to run is quite consistently longer, however the ratio of the two seems erratic and doesn't appear to follow a trend, however it's relatively stable when I re run the same sizes multiply times.

To make this even more weird, on my Macbook it follows an entirely different curve: It starts similarly, being near 2x slower for small sizes, at medium dimensions (around 100-200 lines/columns) it finishes in 20-30% of the time, then at big sizes it's neck-and-neck again.

axim@ax1m-MBP:~/Desktop/CodeStuff/prcpp/a1/matrix$ java Matrix 5 12 8
time taken in Java:     32454ns
time taken in C++:      43379ns
results equal:          true
axim@ax1m-MBP:~/Desktop/CodeStuff/prcpp/a1/matrix$ java Matrix 20 48 32
time taken in Java:     1278592ns
time taken in C++:      103246ns
results equal:          true
axim@ax1m-MBP:~/Desktop/CodeStuff/prcpp/a1/matrix$ java Matrix 80 192 128
time taken in Java:     12594845ns
time taken in C++:      2604591ns
results equal:          true
axim@ax1m-MBP:~/Desktop/CodeStuff/prcpp/a1/matrix$ java Matrix 320 768 512
time taken in Java:     1272993352ns
time taken in C++:      1217730765ns
results equal:          true
axim@ax1m-MBP:~/Desktop/CodeStuff/prcpp/a1/matrix$ java Matrix 1280 3072 2048
time taken in Java:     110882859155ns
time taken in C++:      102803692425ns
results equal:          true

The third call here is about what I was expecting from talking to my classmates, but the program will need to handle larger data as per the assignment. If anyone could explain what the heck is going on here, that would be great?

Mohit Tyagi
  • 2,788
  • 4
  • 17
  • 29
Axim
  • 332
  • 3
  • 11
  • Interfacing the C++ code through jni comes with a cost, that a good java compiler may eliminate. – user0042 Oct 02 '17 at 18:27
  • This question seems to be a better fit for http://codereview.stackexchange.com/ , although it would help to phrase it more as "seeking performance improvements for the native method call (C++ implementation), to surpass that of the java code" – Justin Oct 02 '17 at 18:30
  • IIRC, arrays are copied when passed through the JNI, which could be the entire reason for your performance hit – Justin Oct 02 '17 at 18:31
  • Related: https://stackoverflow.com/q/7699020/1896169 – Justin Oct 02 '17 at 18:33
  • 1
    1) You didn't list the compiler options you used to create the C++ module. We have no idea if you turned on optimizations or not. 2) You didn't measure the C++ in an isolated fashion. Where are you calls *within the C++ code* to start / stop a timer to see how the real code performs? – PaulMcKenzie Oct 02 '17 at 18:43
  • related: *[Why is my native C++ code running so much slower than Java on Android?](https://stackoverflow.com/a/46475506/192373)* – Alex Cohn Oct 02 '17 at 19:46

1 Answers1

0

Try to use -O3 while compiling your code ;)

First of all, you don't have to commit changes for arrays that are input. If you will use JNI_ABORT for arrays that are not required to be passed back to Java, you will get faster computations in C++:

-O3

java -Djava.library.path=. -cp . Matrix 5 12 8
C++: 0
java -Djava.library.path=. -cp . Matrix 20 48 32
C++: 0
java -Djava.library.path=. -cp . Matrix 80 192 128
C++: 2
java -Djava.library.path=. -cp . Matrix 320 768 512
C++: 1254
java -Djava.library.path=. -cp . Matrix 1280 3072 2048
C++: 104179

-O0

java -Djava.library.path=. -cp . Matrix 5 12 8
C++: 0
java -Djava.library.path=. -cp . Matrix 20 48 32
C++: 0
java -Djava.library.path=. -cp . Matrix 80 192 128
C++: 7
java -Djava.library.path=. -cp . Matrix 320 768 512
C++: 2400
java -Djava.library.path=. -cp . Matrix 1280 3072 2048
C++: 183814

-O3 + JNI_ABORT

java -Djava.library.path=. -cp . Matrix 5 12 8
C++: 0
java -Djava.library.path=. -cp . Matrix 20 48 32
C++: 0
java -Djava.library.path=. -cp . Matrix 80 192 128
C++: 3
java -Djava.library.path=. -cp . Matrix 320 768 512
C++: 1121
java -Djava.library.path=. -cp . Matrix 1280 3072 2048
C++: 96696

Java

java -Djava.library.path=. -cp . Matrix 5 12 8
Java: 0
java -Djava.library.path=. -cp . Matrix 20 48 32
Java: 1
java -Djava.library.path=. -cp . Matrix 80 192 128
Java: 13
java -Djava.library.path=. -cp . Matrix 320 768 512
Java: 1242
java -Djava.library.path=. -cp . Matrix 1280 3072 2048
Java: 101324

You can read more about JNI_ABORT here: http://jnicookbook.owsiak.org/recipe-No-013/

If I was supposed to write this code, I would have passed u, v, w to C++, I would have created arrays there, and I would have created output array and pass it back to Java. Way less copy-paste of data ;)

Oo.oO
  • 12,464
  • 3
  • 23
  • 45