Processing an image in parallel using java 8

Question

I am processing a plenty of 4k images by calculating a parameter on small (64X64 pixel) patches of the image. The task is now being carried out in a sequential fashion, one patch at a time. A snippet of my code is copied below to show you the idea.

for (int i = 0; i < imageW / pSize; i++) {
  for (int j = 0; j < imageH / pSize; j++) {

  thisPatch = MatrixUtil.getSubMatrixAsMatrix(image, i * pSize, j * pSize, pSize);
  results[i][j] = computeParamForPatch(thisPatch);
  }
}

I now need to parallelize this to possibly save some time. As you can see, the process for each patch is completely independent from all others. To do so, I either need to remember the location of each patch by using a Map or use forEachOrdered(). Unfortunately I don't think using maps, something like Map<Point, double[][]> would be parallelized. So this is my question: Apart from using forEachOrdered() which affects the performance negatively, is there any other way to process an image in parallel?

One solution: I tried the following code (suggested by @DHa) which makes a significant improvement:

int outputW = imageW / pSize;
int outputH = imageH / pSize;
IntStream.range(0, outputW * outputH).parallel().forEach(i -> {

 int x = (i % outputW);
 int y = (i / outputH);
 tDirectionalities[x][y] = computeDirectionalityForPatch(
                    MatrixUtil.computeParamForPatch(image, x * pSize, y * pSize, pSize));
});

Results:

Sequential: 15754 ms
Parallel: 5899 ms

Why parallelize *this*? It seems like it'd be a lot easier to process the *images* in parallel instead of trying to process each image in parallel while still processing the images sequentially. — Andrew Henle, Apr 29 '18 at 15:43
@AndrewHenle , For now, that would add a level of difficulty to the project that we want to avoid. What I explained here is not the entire project. — Azim, Apr 29 '18 at 15:47
@AndrewHenle because then IO would be even more of a bottleneck. — Mad Physicist, Apr 29 '18 at 16:28
@MadPhysicist *because then IO would be even more of a bottleneck.* How can that possibly be known if it hasn't been tried? — Andrew Henle, Apr 29 '18 at 16:42
A 4K image is heavy on memory so processing them one at a time might clearly be beneficial. — DHa, Apr 29 '18 at 16:59
@Turing85 Depends on the hardware, we don't have that constraint given here. I regularly work in environments with 128MB available in total. — DHa, Apr 29 '18 at 17:05
@DHa. Do you do image processing with Java in those environments? — Mad Physicist, Apr 29 '18 at 19:15
@Mad Physicist Yes :) Turn the question around if you prefer, why waste memory when that gives no benefit? I happily 'waste' loads of memory if that balances well against an increase in performance. Here I can see no such benefit, it is more likely to worsen performance. You could also see it from a UI perspective, would you prefer having results after 5/10/15s or three after 15s? — DHa, Apr 30 '18 at 04:39

DHa · Accepted Answer · 2018-04-29T17:20:24.667

2

This solution uses a parallel stream.

See also How many threads are spawned in parallelStream in Java 8 for how to control amount of threads that work on the stream simultaneously.

    int patchWidth = (int)Math.ceil((double)imageW / pSize);
    int patchHeight = (int)Math.ceil((double)imageH / pSize);

    IntStream.range(0, patchWidth * patchHeight).parallel().forEach(i -> {
        int x = (i % patchWidth);
        int y = (i / patchWidth);

        thisPatch = MatrixUtil.getSubMatrixAsMatrix(image, x * pSize, y * pSize, pSize);
        results[x][y] = computeParamForPatch(thisPatch);
    });

edited Apr 29 '18 at 17:20

answered Apr 29 '18 at 16:23

DHa

659
1
6
21

I made a silly mistake in my test (this is why I removed me comment here). In fact, the parallel version you proposed indeed makes it significantly faster. See my update. – Azim Apr 29 '18 at 22:15
1

@Azim From the timing results you've given it looks like you are now employing 3 cores on it, parallel() will default to use all - 1 core, so if you use the techniques in the link to increase that to all cores you could increase the result a little bit further, at the cost of possibly starving other threads from CPU time. Perhaps a better approach is to use the remaining core for IO operations that handle input/output to this function. – DHa Apr 30 '18 at 04:44

Processing an image in parallel using java 8

1 Answers1