7

I've read from several other reports that people generally get around 4-80 ns on a plain, basic JNI call:

From What makes JNI calls slow?

For trivial native methods, last year I found calls to average 40 ns on my Windows desktop, and 11 ns on my Mac desktop ..

From Possible increase of performace using JNI?

However JNI calls often take around 30 ns ..

When I call simple methods in my JNI code (by simple I mean no more than one argument of time int returning type int), I'm getting a round trip call time (measured with System.nanoTIme) of between 50,000-80,000 ns.

If I do a VM "warm-up" and run the call a few hundred times before timing it, I still get about 2000-4000 ns (800-1000 below). (As stated above, I have heard others report < 100 ns.. and having 10-20 times higher than that does add up when making frequent calls.)

Is this normal speed? What could be causing my native code to be called so much slower?

UPDATE:

JNIEXPORT jint JNICALL Java_com_snap2d_gl_RenderControl_correctGammaNative
  (JNIEnv *env, jobject obj, jint pixel) {
    return X2D_correctGamma(pixel, 1.0f);
}

Where X2D_correctGamma(int,float) is a method to correct a pixel's gamma value (I've implemented native code since posting).

Java benchmarking:

    for(int i = 0; i < 100; i++) {
        long t1 = System.nanoTime();
        correctGammaNative(0xFFF);
        long t2 = System.nanoTime();
        System.out.println(t2 - t1);
    }

That's the "warm-up" code. Most of the printlns read 800-1000ns after the initial call.

Unfortunately, I may have to scrap this, because it's supposed to be used in rendering, and calling this thousands of times per second is bringing the frame rate down to 1 FPS.

System Info:

Behaves similarly on: JDK1.6.0_32 (64-bit), JDK1.7.0_04 (64-bit), and JRE1.7.0_10 (32-bit)

Windows 7 64-bit

16GB RAM

i7-3770 quad-core CPU @ 3.4-3.9ghz

GNU GCC MinGW Compilers (32-bit and 64-bit)

Community
  • 1
  • 1
bgroenks
  • 1,859
  • 5
  • 34
  • 63
  • i would think that this depends on what resources the DLL task is using... – mre Jan 11 '13 at 00:34
  • 1
    what does your native method do? does it just return the argument, or does it do something more? – jdigital Jan 11 '13 at 00:36
  • It literally... does nothing. – bgroenks Jan 11 '13 at 00:37
  • Define resources. Included headers or actual memory/system resources? – bgroenks Jan 11 '13 at 00:37
  • Maybe it's the time for jni to find your dll in the $PATH, to load the DLL and invoke the call. Try a second call, immediately after, and check the roundtrip. – Luigi R. Viggiano Jan 11 '13 at 00:40
  • @LuigiR.Viggiano ghostsoldier claimed "If I do a VM "warm-up" and run the call a few hundred times before timing it, I get about 2000-4000 ns." – Alex DiCarlo Jan 11 '13 at 00:42
  • I'm not a JNI expert, but someone who is would need more information like JVM, OS, etc. – Alex DiCarlo Jan 11 '13 at 00:43
  • 1
    So, the "warm up" it's possibly the time taken to scan the PATH, find the DLL from the filesystem, load it in memory then dynamically invoke the native function. Can you check how long is your PATH env var? 50/80ms it's not that long time. I don't find it particuarly slow: 50-80ms to find and load a DLL is not so bad. I would expect 0-1ms for the subsequent calls though, since your method is doing *nothing*. Is there any parameters/return value that need to be converted? – Luigi R. Viggiano Jan 11 '13 at 00:43
  • @LuigiR.Viggiano OP _is_ getting 0-1ms, because he measures nanoseconds, not milliseconds :) – Anton Kovalenko Jan 11 '13 at 00:48
  • I thought loadLibrary() got the DLL from the file system... do you have to call the function for it to be fully loaded? – bgroenks Jan 11 '13 at 00:51
  • what is "OP"? 0-1ms sounds normal to me. – Luigi R. Viggiano Jan 11 '13 at 00:52
  • 2
    "Original poster" It just seems a lot of others report 0-100 ns... and having 10-20 times higher than that does add up when making frequent calls. – bgroenks Jan 11 '13 at 00:57
  • 1
    Can you show the code block you're timing and the method of timekeeping that you use? – us2012 Jan 11 '13 at 00:58
  • Updated to compensate for requested info. – bgroenks Jan 11 '13 at 01:44
  • 1
    Have you considered running the gamma correction on the entire image rather than one pixel at a time? – jdigital Jan 11 '13 at 01:48
  • What using BufferedImageOp? That's my fallback plan. I wanted to do it myself for fun. :D – bgroenks Jan 11 '13 at 01:49
  • And have you considered a static method so that you don't need the object parameter? – jdigital Jan 11 '13 at 01:50
  • No. That's a possibility. How much of an effect would that have exactly on performance? – bgroenks Jan 11 '13 at 01:53
  • 1
    Am I reading this correctly - you're calling a native method to modify a *single* pixel with some cheap arithmetic operation? AFAIK, the JVM cannot in general inline such functions. Why not do a direct implementation in Java instead of calling an external C function? – us2012 Jan 11 '13 at 02:03
  • I did that first. It was just as slow. – bgroenks Jan 11 '13 at 02:05
  • @jdigital A `static` version just changes the object parameter to a class parameter. Nothing is saved. – user207421 Jan 12 '13 at 03:07
  • Would including have any effect on the native call? – bgroenks Jan 12 '13 at 20:18
  • @ghostsoldier23 No, why do you think that? – user207421 Jan 13 '13 at 21:43
  • I didn't really think so. I was just wondering if it was a possibility. – bgroenks Jan 13 '13 at 21:51
  • But why did you wonder? What mechanism are you positing that would cause a slowdown that could be activated by including windows.h? – user207421 Jan 13 '13 at 23:16
  • I guess I was thinking maybe a longer linking process or something for the loaded DLL, but honestly it doesn't seem like a very logical thought in hindsight. – bgroenks Jan 14 '13 at 21:02

2 Answers2

7

Is this normal speed?

No. If you are really getting 50,000-80,000 ns per JNI call, something weird is going on.

What could be causing my native code to be called so much slower?

No idea. It could be almost anything. But if you showed us the native code and the Java code we'd be in a better position to explain.

My money would be on this NOT being a problem with the JNI calls at all. Rather, I expect it is an artefact of the way that you are benchmarking. There are lots of things that you can do (or fail to do) that will cause a Java benchmark to produce spurious results. We need to see your benchmarking code.


OK so, your update indicates that your earlier reported timings (50,000-80,000 or 2000-4000) are incorrect or irrelevant. Timings of 800-1000ns sound plausible, given the following.

I think there are three flaws in your benchmark.

  • You are trying to measure a time interval in the order of a few nanoseconds. But what your measurement is not taking account of is that a call to System.nanoTime() takes a significant time. What you need to do is measure the time it takes to make a few THOUSAND or MILLION JNI calls between each pair of System.nanoTime() calls, then calculate and print the average.

  • Your code doesn't separate the time taken to make the JNI call from the time taken to execute the call body. (Or maybe it does ... and you haven't shown us that code.). I suspect that the gamma correction will take significantly longer than the JNI call overhead.

  • Your warmup is inadequate. It is doubtful that you are running the code for long enough for JIT compilation to kick in. Furthermore, the fact that your benchmark code is confined to a single method call means that even if the JIT compiler did run, the chances are that you'd never call the JIT-compiled version of the method. Put the benchmark code into a method and call the method repeatedly.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Also the previous timings weren't necessarily incorrect. I still get those numbers for the first few calls. – bgroenks Jan 11 '13 at 02:43
  • Well yea ... but the way that your question is written, **it looks at first glance** as if those are your "real" numbers. It is not until 2/3rds through the question that you give the actual numbers. Misleading. – Stephen C Jan 11 '13 at 03:07
  • My apologies. I will modify the question. – bgroenks Jan 11 '13 at 04:06
0

Warm up is way too short. A method is jitted after ~10.000 calls. Additionally move the loop body to a method, as there is a difference inbetween "in-place" infinite-loop jitting and jitting a single method.

R.Moeller
  • 3,436
  • 1
  • 17
  • 12