14

I came across this in answering another question. I was trying to diagnose which code change had a greater effect on the speed. I used a boolean flag in a for loop to switch between using helper methods to construct a Color.

The interesting behavior is that when I decided which one was faster and removed the if the speed of the code amplified 10x. Taking 140ms before and just 13ms afterward. I should only be removing one calculation out of about 7 from the loop. Why such a drastic increase in speed?

Slow code: (runs in 141 milliseconds when helperMethods is false) *See edit 2

public static void applyAlphaGetPixels(Bitmap b, Bitmap bAlpha, boolean helperMethods) {
    int w = b.getWidth();
    int h = b.getHeight();
    int[] colorPixels = new int[w*h];
    int[] alphaPixels = new int[w*h];
    b.getPixels(colorPixels, 0, w, 0, 0, w, h);
    bAlpha.getPixels(alphaPixels, 0, w, 0, 0, w, h);
    for(int j = 0; j < colorPixels.length;j++){
        if(helperMethods){
            colorPixels[j] = Color.argb(Color.alpha(alphaPixels[j]), Color.red(colorPixels[j]), Color.green(colorPixels[j]), Color.blue(colorPixels[j]));
        } else colorPixels[j] = alphaPixels[j] | (0x00FFFFFF & colorPixels[j]);
    }
    b.setPixels(colorPixels, 0, w, 0, 0, w, h);
}

Fast Code: (Runs in 13ms)

public static void applyAlphaGetPixels(Bitmap b, Bitmap bAlpha) {
    int w = b.getWidth();
    int h = b.getHeight();
    int[] colorPixels = new int[w*h];
    int[] alphaPixels = new int[w*h];
    b.getPixels(colorPixels, 0, w, 0, 0, w, h);
    bAlpha.getPixels(alphaPixels, 0, w, 0, 0, w, h);
    for(int j = 0; j < colorPixels.length;j++){
        colorPixels[j] = alphaPixels[j] | (0x00FFFFFF & colorPixels[j]);
    }
    b.setPixels(colorPixels, 0, w, 0, 0, w, h);
}

EDIT: It seems the issue is not with the fact that the if is inside the loop. If I elevate the if outside of the loop. The code runs slightly faster but still at the slow speeds with 131ms:

public static void applyAlphaGetPixels(Bitmap b, Bitmap bAlpha, boolean helperMethods) {
    int w = b.getWidth();
    int h = b.getHeight();
    int[] colorPixels = new int[w*h];
    int[] alphaPixels = new int[w*h];
    b.getPixels(colorPixels, 0, w, 0, 0, w, h);
    bAlpha.getPixels(alphaPixels, 0, w, 0, 0, w, h);
    if (helperMethods) {
        for (int j = 0; j < colorPixels.length;j++) {
            colorPixels[j] = Color.argb(Color.alpha(alphaPixels[j]),
                                        Color.red(colorPixels[j]),
                                        Color.green(colorPixels[j]),
                                        Color.blue(colorPixels[j]));
        }
    } else {
        for (int j = 0; j < colorPixels.length;j++) {
             colorPixels[j] = alphaPixels[j] | (0x00FFFFFF & colorPixels[j]);
        }
    }

    b.setPixels(colorPixels, 0, w, 0, 0, w, h);
}

EDIT 2: I'm dumb. Really really dumb. Earlier in the call stack I used another boolean flag to switch between between using this method and using another method that uses getPixel instead of getPixels. I had this flag set wrong for all of my calls that have the helperMethod parameter. When I made new calls to the version without helperMethod I did it correct. The performance boost is because of getPixels not the if statement.

Actual Slow code:

public static void applyAlphaGetPixel(Bitmap b, Bitmap bAlpha, boolean helperMethods) {
    int w = b.getWidth();
    int h = b.getHeight();
    for(int y=0; y < h; ++y) {
        for(int x=0; x < w; ++x) {
            int pixel = b.getPixel(x,y);
            int finalPixel;
            if(helperMethods){
                finalPixel = Color.argb(Color.alpha(bAlpha.getPixel(x,y)), Color.red(pixel), Color.green(pixel), Color.blue(pixel));
            } else{
                finalPixel = bAlpha.getPixel(x,y) | (0x00FFFFFF & pixel);
            }
            b.setPixel(x,y,finalPixel);
        }
    }
}

Note:All speeds are an average of 100 runs.

Community
  • 1
  • 1
Fr33dan
  • 4,227
  • 3
  • 35
  • 62
  • 1
    The if-statement probably makes the code a lot harder to optimize. – Mysticial Sep 02 '12 at 23:11
  • 2
    Branching / Branch prediction ? http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array – auselen Sep 02 '12 at 23:12
  • Use Traceview to determine exactly what is taking up your time. – CommonsWare Sep 02 '12 at 23:18
  • 1
    Traceview will probably fool you since that will disable JIT. – auselen Sep 02 '12 at 23:19
  • 1
    please provide much more detailed information. In which case is what how fast? Just removed the if or the if and the body of it? – WarrenFaith Sep 02 '12 at 23:20
  • Btw, aren't you suppose to use colorPixels[j] = alphaPixels[j] << 24 | (0x00FFFFFF & colorPixels[j]); instead of colorPixels[j] = alphaPixels[j] | (0x00FFFFFF & colorPixels[j]); – auselen Sep 02 '12 at 23:20
  • 1
    `helperMethods` is true/false for the entire loop. So it can't be branch prediction. – Mysticial Sep 02 '12 at 23:21
  • @auslen No because then I would be setting the trailing bits to the alpha bits. I want use the bits in the same location – Fr33dan Sep 02 '12 at 23:21
  • 1
    A good enough JIT may be able to notice in the fast code that all you are doing is copying every fourth byte. – Neil Sep 02 '12 at 23:22
  • To be clear: are you comparing times when `helperMethods` is `true` or `false` against the "fast code"? – Sam Sep 02 '12 at 23:28
  • @Sam I that is what I created the method to do. Running it with it `false` ran about 6ms faster than with it `true`. The time I'm comparing the fast code to is the time when it is `false` – Fr33dan Sep 02 '12 at 23:29
  • Which android version is this? – auselen Sep 02 '12 at 23:41
  • Android 4.0.4, The times I gave were on an Optimus S with an AOSP build. – Fr33dan Sep 02 '12 at 23:44
  • @Fr33dan does manual loop unrolling improve the speed? – obataku Sep 03 '12 at 00:13
  • @veer I'm not sure how I could cleanly unroll it without adding if statements. – Fr33dan Sep 03 '12 at 00:16
  • I run this on my device and I get similar numbers, I don't see a big difference like 10x. – auselen Sep 03 '12 at 00:54
  • That is very peculiar. Can you try installing my APK so we can figure out if it's my device or my code? http://fr33dan.com/AndroidSandbox.apk – Fr33dan Sep 03 '12 at 01:12
  • 1
    You should be able to answer this question for yourself, by [*this method*](http://stackoverflow.com/a/317160/23771). If you're seeing a 10:1 difference in speed, that means the slow one is spending 90% of its time doing something else. So if you just interrupt it a few times, the chance is 90% that you will see the problem each time you interrupt it. (I would want to double-check the possibility that `helperMethods` might actually be *true*.) – Mike Dunlavey Sep 03 '12 at 01:22
  • @MikeDunlavey Your solution would work on a desktop environment but there I'm not aware of a way to do it on an android device. Also this method assumes that the stack is being taken away from my code. If this was the case would it have not shown up when I ran through the code with the debugger? – Fr33dan Sep 03 '12 at 01:40
  • @Fr33dan: Sorry, I'm no Android expert. Not sure I understand what you mean by stack being taken away. Regardless, isn't there a way to run Android code under an emulator or something, so you can interrupt it and really see what's going on? Or do they just make it a *&*&$& black box? – Mike Dunlavey Sep 03 '12 at 01:53
  • Numbers were ~200 versus ~15. Can you share the source code somewhere? – auselen Sep 03 '12 at 06:15
  • @auselen Interesting, this must mean the `Bitmap` setup must have some effect on how the execution runs, which makes even less sense to me since that code was consistent. Here is a copy of the source that generates the problem: http://www.fr33dan.com/AndroidSandbox.zip – Fr33dan Sep 03 '12 at 09:22
  • 1
    @Fr33dan It is probably your timing. You start timer too early, and load bitmaps in the tests which are getting effected by GCs of previous test runs. You update ui while next test is running. When I clear your code to get just what you describe above, in the result I see similar numbers, no big difference. – auselen Sep 03 '12 at 10:32

4 Answers4

4

Try hoisting the condition out of the loop:

if (helperMethods) {
    for (int j = 0; j < colorPixels.length;j++) {
        colorPixels[j] = Color.argb(Color.alpha(alphaPixels[j]),
                                    Color.red(colorPixels[j]),
                                    Color.green(colorPixels[j]),
                                    Color.blue(colorPixels[j]));
    }
} else {
    for (int j = 0; j < colorPixels.length;j++) {
         colorPixels[j] = alphaPixels[j] | (0x00FFFFFF & colorPixels[j]);
    }
}
Neil
  • 54,642
  • 8
  • 60
  • 72
2

In the 'fast code' you never run the statement

colorPixels[j] = Color.argb(Color.alpha(alphaPixels[j]), Color.red(colorPixels[j]), Color.green(colorPixels[j]), Color.blue(colorPixels[j])); 

But in the 'slow code' if the boolean is set to true at least once you run this addiotional statement that makes the time longer. If your condition is always false then the if statement is checked about 7 times in each iteration through the loop. Try to place the if outside the loop.

Marcin S.
  • 11,161
  • 6
  • 50
  • 63
2

It is probably your profiling code, that confusing you. Try to isolate the part of the code that you want to profile and just measure that part, avoid GCable operations like creating Bitmaps in your cases.

If I call your test code with

testing.loadDrawable(this, false, true, false)

it runs slow. But if I call it with

testing.loadDrawable(this, true, true, false)

That's a similar (still worse) number. So useGetPixels makes all the difference. Which I guess gets the Bitmap data into a local buffer and set the results later.

auselen
  • 27,577
  • 7
  • 73
  • 114
  • If the problem was the GC wouldn't that mean only the first run would be fast since only the later ones would be bogged down by the GC clearing the earlier test? Also this is the biggest performance impact I've heard of as a result of the GC and it seems to have come out of nowhere. How can I predict when the GC will cause such massive problems? This feels like I'm getting close to the answer, but I'd still like to know exactly what's going on. – Fr33dan Sep 03 '12 at 14:23
  • 1
    In this answer I don't claim problem is due to GCs but your code is too messy and doesn't do the profiling correctly. I believe your problem is `useGetPixels`, when you enable it run time increases. Otherwise `if` as you clain doesn't change run time. – auselen Sep 03 '12 at 17:19
  • So I'm confused. You are saying getPixels had an increased run time? For me it was faster every time. You're the "fast" version is `loadDrawableHard(this,true,true)`. I changed the name so I would be able to tell easily which was the hard coded one. In this version all the code is the same except the `applyAlphaGetPixelsHard()` and 'applyGetPixelHard()` are called instead. If I run it with `useGetPixels` set to false I don't see the same improvement. I originally wrote `loadDrawableHard()` to only use `applyAlphaGetPixelsHard()` but added `applyGetPixelHard()` to remove a possible differences. – Fr33dan Sep 03 '12 at 17:56
  • I cannot believe what I did. If you look at all my calls they all have `useGetPixels` set to false. Nothing wrong with my profiling code, I've just been profiling the wrong thing the whole stupid time. – Fr33dan Sep 03 '12 at 18:18
  • This was all I tried to say from the beginning however I guess you had to see it to understand. When I say your profiling code is confusing you, I wanted to say that code was too messy, making it for hard to see how you call those functions. GC thing was just a side notice. – auselen Sep 03 '12 at 18:33
0

In your fast code you don't use class Color at all. I assume initialization of this class takes some time, it has lots of static method and static code.

You can try to do following: make sure Color class is fully loaded and initialized before making your test (you can call any static method from Color class before calling your applyAlphaGetPixels() method). Then run your test and compare results.

HitOdessit
  • 7,198
  • 4
  • 36
  • 59
  • Wouldn't this only effect the first time the test is run within an instance of the program? – Fr33dan Sep 03 '12 at 14:25
  • Most likely it should affect only first run in the production program, but it could affect differently in your test runs – HitOdessit Sep 03 '12 at 14:37