14

In the following SO question: https://stackoverflow.com/questions/2067955/fast-bitmap-blur-for-android-sdk @zeh claims a port of a java blur algorithm to C runs 40 times faster.

Given that the bulk of the code includes only calculations, and all allocations are only done "one time" before the actual algorithm number crunching - can anyone explain why this code runs 40 times faster? Shouldn't the Dalvik JIT translate the bytecode and dramatically reduce the gap to native compiled code speed?

Note: I have not confirmed the x40 performance gain myself for this algorithm, but all serious image manipulation algorithm I encounter for Android, are using the NDK - so this supports the notion that NDK code will run much faster.

Guy
  • 12,250
  • 6
  • 53
  • 70
  • There are so many places where the performance can be lost. If you're interested in these implementations specifically, profile them both and see where the time is spent. – laalto Jan 28 '14 at 07:48

3 Answers3

21

For algorithms that operate over arrays of data, there are two things that significantly change performance between a language like Java, and C:

  • Array bound checking: Java will check every access, bmap[i], and confirm i is within the array bounds. If the code tries to access out of bounds, you will get a useful exception. C & C++ do not check anything and just trust your code. The best case response to an out of bounds access is a page fault. A more likely result is "unexpected behavior".

  • Pointers: You can significantly reduce the operations by using pointers.

Take this innocent example of a common filter (similar to blur, but 1D):

for(int i = 0; i < ndata - ncoef; ++i) {  
    z[i] = 0;  
    for(int k = 0; k < ncoef; ++k) {  
        z[i] += c[k] * d[i + k];  
    }  
}  

When you access an array element, coef[k] is:

  • Load address of array coef into register;
  • Load value k into a register;
  • Sum them;
  • Go get memory at that address.

Every one of those array accesses can be improved because you know that the indexes are sequential. Neither the compiler, nor the JIT can know that the indexes are sequential so they cannot optimize fully (although they keep trying).

In C++, you would write code more like this:

int d[10000];  
int z[10000];  
int coef[10];  
int* zptr;  
int* dptr;  
int* cptr;  
dptr = &(d[0]); // Just being overly explicit here, more likely you would dptr = d;  
zptr = &(z[0]); // or zptr = z;  
for(int i = 0; i < (ndata - ncoef); ++i) {  
    *zptr = 0; 
    *cptr = coef;  
    *dptr = d + i;  
    for(int k = 0; k < ncoef; ++k) {  
        *zptr += *cptr * *dptr;  
        cptr++;  
        dptr++;  
    }  
    zptr++;  
}  
       

When you first do something like this (and succeed in getting it correct) you will be surprised how much faster it can be. All the array address calculations of fetching the index and summing the index and base address are replaced with an increment instruction.

For 2D array operations such as blur on an image, an innocent code data[r,c] involves two value fetches, a multiply and a sum. So with 2D arrays the benefits of pointers allows you to remove multiply operations.

So the language allows real reduction in the operations the CPU must perform. The cost is that the C++ code is horrendous to read and debug. Errors in pointers and buffer overflows are food for hackers. But when it comes to raw number grinding algorithms, the speed improvement is too tempting to ignore.

jdr5ca
  • 2,809
  • 14
  • 25
  • 4
    Also very important: the optimizations performed by the Dalvik JIT are somewhat limited in scope (see http://stackoverflow.com/questions/4912695/what-optimizations-can-i-expect-from-dalvik-and-the-android-toolchain ). Note that array bounds checking may not be a significant performance issue -- if the compiler can determine that the range of possible index values must fall between 0 and N, it can generate code that does a single run-time comparison between the array size and N before entering the loop. – fadden Jan 28 '14 at 18:53
  • @jdr5ca Is that "for{.." a for-loop typo? or is it done so to avoid typing error when the answer is posted in here? or in fact there is a syntax of c++ that allows it? I'm confused. You consistently typed it that way on this answer. – marcius.tan Aug 09 '20 at 16:32
  • 3
    okay, curly brace was wrong and corrected 6 years later. – jdr5ca Aug 19 '20 at 22:17
0

Another factor not mentioned above is the garbage collector. The problem is that garbage collection takes time, plus it can run at any time. This means that a Java program which creates lots of temporary objects (note that some types of String operations can be bad for this) will often trigger the garbage collector, which in turn will slow down the program (app).

JJ McKool
  • 45
  • 4
  • And there are many associated overheads to that. e.g. android compiler puts a `Suspend Check` (checking for GC) instruction in loops, so the app won't 'starve' for heap. This alone, even on a loop with some math ops on a primitive 1D array (where things are really close to a C native version) can cause significant slowdowns. – Paschalis Sep 24 '20 at 00:21
-9

Following is an list of Programming Language based on the levels,

  • Assembly Language ( Machine Language, Lover Level )
  • C Language ( Middle Level )
  • C++, Java, .net, ( Higher Level )

Here Lower level language has direct access to the Hardware. As long as the level gets increased the access to the hardware gets decrease. So Assembly Language's code runs at the highest speed while other language's code runs based on their levels.

This is the reason that C Language's code run much faster than the Java's code.

Lucifer
  • 29,392
  • 25
  • 90
  • 143
  • 3
    This is wrong. I definitely wouldn't put C++ at the same level than Java or C#. Sure C++ offers a great number of paradigms but it is way closer to C than to everything else when comparing the resulting binary. Especially as Java and .NET are managed. Also, performance is not only a matter of "how close to the metal" a langage is. If you have a poor algorithm, it will be slow no matter what langage you pick. – ereOn Jan 28 '14 at 07:37
  • 2
    Language level isn't the reason. Being compiled vs. being interpreted is the main performance driver here, assuming the algorithms and their memory access patterns (caching!) are about the same. – laalto Jan 28 '14 at 07:42
  • C++ and C generate assembly code, while Java, c# generate code executed by an intermediate interpreter. That's where the main difference is. The above list is not correct. C/C++ are on the same "level". Still does not answer the question: on modern VMs with JIT calculations should run about 10-30%% slower than compiled code even with garbage collection runs in-between, sometimes not even that. Surely not x4 slower. – Guy Jan 28 '14 at 09:02