4

I'm trying to multithread some OpenCV4Android code. I divide a 432x432 image into 9 144x144 segments and pass each to a different thread:

Thread[] threads = new Thread[9];
for (int i = 0; i < 3; i++) {
    for (int j = 0; j < 3; j++) {
        threads[3*i+j] = new Thread(new MyThread(image.rowRange(144*i, 144*(i+1)).colRange(144*j, 144*(j+1))));
        threads[3*i+j].start();
    }
}

for (Thread thread : threads) try {thread.join();} catch (InterruptedException e) {};

Here is the thread class:

public class MyThread implements Runnable {
    final Mat block;

    public MyThread(Mat block) {
        this.block = block;
    }

    public void run() {
        /* do image processing on block */
        Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
        Mat closed = new Mat();
        Imgproc.morphologyEx(block, closed, Imgproc.MORPH_CLOSE, kernel);
        Core.divide(block, closed, block, 1, CvType.CV_32F);
        Core.normalize(block, block, 0, 255, Core.NORM_MINMAX);
        block.convertTo(block, CvType.CV_8UC1);     
        Imgproc.threshold(block, block, -1, 255, Imgproc.THRESH_BINARY_INV+Imgproc.THRESH_OTSU);
    }
}

I have two issues:

  1. Although the threads are modifying the individual blocks correctly, the modifications are not showing up in the final image. This would make sense if Mat block were passed by value to the thread, but Java should be passing its reference to the thread instead.

  2. The runtime is longer than the unthreaded code - in my emulator, it goes up from ~1200 to ~1500 ms. Is this a problem with the emulator, or is multithreading a really bad idea here for some reason?

1''
  • 26,823
  • 32
  • 143
  • 200
  • 2
    Maybe your emulator only runs one thread at a time, but real hardware might run more. – emrys57 Dec 09 '12 at 23:03
  • 1
    Actually, on further Googling, you're right - the emulator [doesn't support multiple threads](http://android.stackexchange.com/questions/8024/improve-android-emulator-performance-on-windows-7-x64). – 1'' Dec 09 '12 at 23:08
  • 1
    I don't know why your image isn't updating. I've never used OpenCV, but the documentation I've read in the last 5 minutes definitely suggests that the `Mat` is not copied and is operated on in-place. Sorry, that's as much as I can do, time for bed! – emrys57 Dec 09 '12 at 23:22
  • Do I understand this correctly? Your code works as expected when you do not multi-thread it. When you do multi-thread it, and run it in the emulator, it actually runs as it would in a single-core processor, the threads executing one after another, and that's slower than not multi-threading. But now the image data is not updated properly. Have you tried it in in a real target, not the emulator? And, what happens if you use the multi-thread code, but reduce the maximum number of threads created to 1? – emrys57 Dec 10 '12 at 07:28
  • No idea about Java, but that `final Mat block` doesn't mean it cannot be modified? If true, this line of code `this.block = block;` will create a new copy of the Mat object., which will be lost after thread.join(). – Sam Dec 10 '12 at 08:03
  • 1
    Sammy, `final Mat block` creates a "blank final", explained at https://en.wikipedia.org/wiki/Final_(Java), which can be initialised precisely once, and only in a constructor method. The `final` applies to the value of the pointer `block` which can only ever point to one specific `Mat` object. However, the data internal to the `Mat` object can be mutable. Nice try, but not, I suspect, the answer! – emrys57 Dec 10 '12 at 09:04
  • In any case, I can take the `final` away with no change. Alright, looks like I'm going to have to debug in a really painful way, since there's no obvious problem I've missed. – 1'' Dec 10 '12 at 16:20

3 Answers3

3

I've no experience with OpenCV, so I'll address only the second issue.

A thread needs a CPU to run (or a core which acts as a virtual CPU). So, you will never have more threads running in silmultaneous then the real number of cores available in the device.

Let's assume you have a device with 2 cores and you split the work in 9 threads. The final result is that only 2 out of 9 threads will run in silmultaneous, while the remaining 7 will be in the queue waiting for their turn to have CPU.

As there is a cost in Thread creation and switching, the overall performance result would be worse then having only 2 threads.

If you are spliting the work between threads for performance reasons, don't make more threads then the number of cores in the device.

I believe that most devices on market are limited to 1 or 2 cores ...

Regards

Luis
  • 11,978
  • 3
  • 27
  • 35
  • Maybe I'll try 3 instead of 9, then. [This post](http://stackoverflow.com/a/1718522/1397061) says that more than 1 thread per core is optimal in some situations. – 1'' Dec 10 '12 at 02:04
  • The best way is to test and measure. From your post it looks like you are sharing between the threads a pre-loaded image (so no I/O on this part). If the OpenCV isn't performing any I/O, probably you will get to the number of cores. – Luis Dec 10 '12 at 12:11
1

The first problem was being caused by the conversion of the block Mat to a different type in this section:

Core.divide(block, closed, block, 1, CvType.CV_32F);
Core.normalize(block, block, 0, 255, Core.NORM_MINMAX);
block.convertTo(block, CvType.CV_8UC1);     

I'm not sure why this should be a problem, but I fixed it by storing the intermediate floating-point matrix in closed and only putting the final answer back into block:

Core.divide(block, closed, closed, 1, CvType.CV_32F);
Core.normalize(closed, block, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
1''
  • 26,823
  • 32
  • 143
  • 200
0

Luis has addressed the second problem. And i think the first issue is because that you process a new Mat to the thread, the modification on the new Mat will not effect the old one.

I find the source code of rowRange, there is some native code but clearly it creates a new object.

 public Mat rowRange(int startrow, int endrow)
 {

     Mat retVal = new Mat(n_rowRange(nativeObj, startrow, endrow));

     return retVal;
 }
faylon
  • 7,360
  • 1
  • 30
  • 28
  • It creates a new object, but that object internally points to (some of) the same data as the old object. I'm able to modify `image` through `block` when there is only one thread. – 1'' Dec 10 '12 at 03:16