Optimizing nested for loop in java

Question

I'm becoming crazy by trying to optimize the following function in java with OpenCV:

static Mat testPossibleCentersFormula(int x, int y, Mat weight, double gx, double gy, Mat outSum){
    Mat out = outSum;//new Mat(weight.rows(), weight.cols(), CvType.CV_64F);
    float weight_array [] = new float [weight.rows()*weight.cols()];
    weight.get(0,0,weight_array);
    double out_array [] = new double [weight.rows()*weight.cols()];
    out.get(0,0,out_array);

    for (int cy = 0; cy < out.rows(); ++cy) {
         for (int cx = 0; cx < out.cols(); ++cx) {


             if (x == cx && y == cy) {
                    continue;
                  }
            // create a vector from the possible center to the gradient origin
              double dx = x - cx;
              double dy = y - cy;
              // normalize d
              double magnitude = Math.sqrt((dx * dx) + (dy * dy));
              dx = dx / magnitude;
              dy = dy / magnitude;
              double dotProduct = dx*gx + dy*gy;
              dotProduct = Math.max(0.0,dotProduct);
           // square and multiply by the weight
              if (kEnableWeight) {
                  out_array[cy*out.cols()+cx] =  out_array[cy*out.cols()+cx] +dotProduct * dotProduct * (weight_array[cy*out.cols()+cx]/kWeightDivisor);

              } else {
                  out_array[cy*out.cols()+cx] =  out_array[cy*out.cols()+cx] +dotProduct * dotProduct;

              }
    } }


     out.put(0, 0, out_array);
     return out;
         }

The function accesses some pictures' values pixel by pixel, for each frame in a video, and makes it impossible to use it in real time.

I've already converted the Mat operations into array operations, and that has made a great difference, but it is still very very slow. Do you see any way to replace the nested for loop?

Thank you very much,

I don't how if it's a big difference, but declare the variables ooutside the loop. Also try to implement your own `Max` function with simple `if`, it might be faster than calling an external function. — TDG, Jul 08 '16 at 07:13
@TDG it makes no difference to performance, and puts the variables in an unnecessarily broad scope. — Andy Turner, Jul 08 '16 at 07:14
I am very suspicious about what you're doing in the lines `float weight_array [] = new float [weight.rows()*weight.cols()];` and `weight.get(0,0,weight_array);`: you're simply getting the values of a single pixel. If you're processing an image, I would not expect there to be `rows*cols` channels in the image. e.g. if it is an RGB image, there should only be 3. Or is this some convoluted way to get all of the pixel values? — Andy Turner, Jul 08 '16 at 07:15
Thanks @AndyTurner. But I thought I was just putting the "weight_array" value in the upper left corner of the weight mat... I mean, not a single pixel, but the whole image. and rous*cols is not the channel size, but the size of the array to store all the pixel values of the weight mat.. — Alb_a, Jul 08 '16 at 07:34
@AndyTurner I would beg to differ on that it makes no difference to performance. I wouldn't do this work in a static method anyways but construct an object containing the buffers. To the answer question: The work your code does can easyly be parallelized. Also it seems like you could divide your weight array by kWeightDivisor upfront. Try to avoid using so many ifs inside the loop, it would be faster if you made that decision outside the loop. Do you need to switch the kEnableWeight pixelwise? — Gilfoyle, Jul 08 '16 at 08:58
@Gilfoy try decompiling the method with the variables declared inside and outside the loops. There will be no difference in the bytecode. See http://stackoverflow.com/a/8878071/3788176 for an example. — Andy Turner, Jul 08 '16 at 08:59
@AndyTurner This post is about DECLARING a variable outside of the loop. The allocation sill takes place inside the loop. The allocation is the time consuming step an Alb_a should allocate buffers outside the function and reuse them. — Gilfoyle, Jul 08 '16 at 09:07
@Gilfoy all of the `new` operators are outside the loops. And I was responding to TDG's statement "declare the variables ooutside the loop"; the only variables declared in the loop are `double` anyway; but whether they are declared inside or outside the loop makes zero difference, other than broadening the variable scope. — Andy Turner, Jul 08 '16 at 09:08
@AndyTurner This function is to analyze video frames and will be called at the video frame rate, like 15 to 60fps. IMHO it's no use to allocate new buffers for every frame. — Gilfoyle, Jul 08 '16 at 09:13
@Gilfoy oh when you say "the" loop, you're referring to a loop we can't see. Right. Yes, that will be true; I was talking about the loops shown in the question. — Andy Turner, Jul 08 '16 at 09:17
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/116779/discussion-between-gilfoy-and-andy-turner). — Gilfoyle, Jul 08 '16 at 09:20

Andy Turner · Answer 1 · 2016-07-08T09:39:17.127

As I have alluded to in my comment above, I think that the allocation of weight_array and out_array is very suspicious: whilst the Javadoc that I can find for Mat is unhelpfully silent on what is put into an array larger than the image depth when you call mat.get(...), it feels like an abuse of the API to assume that it will return the entire image's data.

Allocating such large arrays each time you call the method is unnecessary. You can allocate a much smaller array, and just reuse that on each iteration:

float[] weight_array = new float[weight.depth()];
double[] out_array = new double[out.depth()];

for (int cy = 0; cy < out.rows(); ++cy) {
  for (int cx = 0; cx < out.cols(); ++cx) {
    // Use weight.get(cx, cy, weight_array)
    // instead of weight_array[cy*out.cols()+cx].

    // Use out.get(cx, cy, out_array) and out.put(cx, cy, out_array)
    // instead of out_array[cy*out.cols()+cx] += ...
  }
}

Note that this does still allocate (probably very small) arrays on each iteration. If you needed to, you could allocate the weight_array and out_array outside the method, and pass them in as parameters; but I would try as suggested here first, and optimize further when/if necessary.

Optimizing nested for loop in java

1 Answers1