3

For a school project we need to multithread an image processing algorithm. I decided to divide the height of the image by the number of threads. Each thread loops though the height and width and changes the colors of the pixels. Weirdly though, the sequential version is always much faster. What am I doing wrong?

The method that is executed by the threads:

public synchronized void applyGrayScale(int numberOfThreads) {
    int heightPerThread = imageHeight / numberOfThreads;
    //Set the thread counter
    int threadCounter = this.getCount();
    this.setCount(count + 1);

    /*The height per thread is calculated by the number of threads. We first start at 0. For the next thread we start at heightPerThread * [current thread number]
    So for example; first thread runs from 0 to 80 pixels. The second thread runs from 81 to 160 pixels.
     */
    for (int j = ((heightPerThread - 2) * threadCounter); j < (heightPerThread * (threadCounter + 1) - 1); j++) {
        for (int i = 0; i < imageInput.getWidth() - 1; i++) {
            //Get the RGB value and set it to grayscale
            int rgb;
            int p = RGB.getRGBW(imageInput, i, j);
            rgb = (int) ((((p >> 16) & 0xFF) * 0.2125) + (((p >> 8) & 0xFF) * 0.7154) + ((p & 0xFF) * 0.0721));
            rgb = (rgb << 16) | (rgb << 8) | (rgb);
            //Set the new RGB value per pixel
            imageOutput.setRGB(i, j, rgb);
        }
    }
}

Code that runs the program:

   int threadsAmount = 5;
   final Thread[] threads = new Thread[threadsAmount];

   BufferedImage image = null;
    try {
        image = ImageIO.read(new File("C:/Cat03.jpg"));
    } catch (IOException e) {
        e.printStackTrace();
    }

    //Define the starting time
    long start = System.currentTimeMillis();

    //Create a new grayscale object and set the image
    final GrayscaleParallel grayscaleParallel = new GrayscaleParallel(image);

    //Thread to apply the grayscale with the number of threads
    class grayScaleThread extends Thread {
        @Override
        public void run() {
            grayscaleParallel.applyGrayScale(threadsAmount);
        }
    }

    //Start all threads
    for (int i = 0; i < threadsAmount; i++) {
        threads[i] = new grayScaleThread();
        threads[i].start();
    }

    //Wait for all threads to finish
    for (int i = 0; i < threadsAmount; i++) {
        try {
            threads[i].join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
    //save result to file
    grayscaleParallel.createImage();

    //Define how long it took
    long end = System.currentTimeMillis();
    float sec = (end - start) / 1000F;
    System.out.println(sec + " seconds parallel");

The output is: 0.897 seconds parallel 0.798 seconds serial

The sequential algorithm:

 for (int j = 0; j < _image.getHeight(); j++) {
        for (int i = 0; i < _image.getWidth(); i++) {
            int rgb;
            int p = RGB.getRGBW(_image, i, j);

            rgb = (int) ((((p >> 16) & 0xFF) * 0.2125) + (((p >> 8) & 0xFF) * 0.7154) + ((p & 0xFF) * 0.0721));
            rgb = (rgb << 16) | (rgb << 8) | (rgb);

            imageOutput.setRGB(i, j, rgb);
        }
    }
    return imageOutput;

When I use a very large image the parallel time seems to improve by 0.5 seconds over the sequential one, but when I don't save the results the parallel algorithm is again slower.

hipy
  • 75
  • 4
  • 2
    Note that one-shot benchmark is not accurate. You can find some information about single/multithreads here: https://stackoverflow.com/questions/6190226/using-parallelism-in-java-makes-program-slower-four-times-slower – jhamon Jun 14 '19 at 07:57
  • You mentioned that it work as expected for large sized images. Have you considered the overheads of context switching? – Anjana Jun 14 '19 at 07:57
  • It could be that the overhead of starting the threads outweighs the benefits of partitioning the work among the threads for sufficiently small images. Also, saving the file to disk is included in your timing and might be the dominating factor. How many cores does your machine have? How many hyperthreads per core does it have? This is a memory-bound problem so the optimal number of threads might be 1 thread per hyperthread. You’ll have to benchmark to figure out what’s best though. Also, you have to run your benchmarks a number of times and calculate some average. Take a look at JMH. – Eric Jun 14 '19 at 08:22
  • A sidenote: `class grayScaleThread` - by convention class names start with an uppercase letter in java. Probably just a typo in the question but wanted to point it out anyway. – Amongalen Jun 14 '19 at 08:34
  • I don’t see where `imageOutput` is declared. – Cris Luengo Jun 14 '19 at 12:53

2 Answers2

3

The problem is that your applyGrayScale() method is synchronized - only one thread can execute it at the same time as all of them run it on the same object. There is no part in your code that could run parallel. So basicly the process is more or less the same as in the sequential variant but you add some extra overhead for context switching and tracking which thread enters the method.

Instead you have to split the image before hand - when creating thread "tell" them which part it should modify. Then change the method from synchronized to normal and let them do their job in parallel.

Amongalen
  • 3,101
  • 14
  • 20
-3

You do nothing wrong!

The bottle-neck in converting the image is not the calculation, it is the memory-access.

With one thread (=sequential) you access the memory in a sequential way. The computer is optimized for such sequential accesses. E.g. by pre loading the next memory addresses if you access an address (memory-bus is much wider than 32 or 64 bits) Also the caching of your CPU predicts sequential accesses because they are very common in programming.

Since the calculation you do is very simple, the slowest part of the algorithm is to read and write the data from/to RAM. If you use more than one thread and access the memory in different (not sequential) addresses, you get a lot of cache-misses and do not benefit from the caching and preloading done by you CPU / memory-bus. That is the reason the performance is worse than the sequential version of your code.

(There might also be some overhead for concurrent reading/writing of the different threads in the multi-threaded version, which might reduce the performance even more)


As mentioned in the comments:

As pointed out by Amongalen (see comment/answer) there might on top of the things I mentioned above be another quite simple reason why there could not be any parallel execution in your implementation.

applyGrayScale() is synchronized and therefore cannot be called in parallel on the same class instance. So if all your threads share the same instance they call applyGrayScale() on, all other threads are wailing while one executes the method.

But I'm sure even after fixing this, the multi-threaded version will still be slower than the sequential one because of the reasons I mentioned above in my answer.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49
  • 1
    `applyGrayScale()` method is `synchronized` - doesn't it mean that only a single thread can modify the image at the same time so there is no benefit from multithreading at all. – Amongalen Jun 14 '19 at 08:23
  • @Amongalen: If the applyGrayScale() is called by all threads on the same instance of the class, you are right. But since not all the code was added to the question I cannot say if this is the case. But if you are right, even fixing this synchronization-issue would most likely not make the multi-threaded version faster than the sequential one because of the reasons I mentioned in my answer. – MrSmith42 Jun 14 '19 at 08:29
  • 1
    I assume that `applyGrayScale()` is the part of `GrayscaleParallel` class which is instantiated few lined above declaration of the `grayScaleThread` worker thread. It seems to me that it is indeed the same instance. – Amongalen Jun 14 '19 at 08:32
  • Multiple threads each have their own cache. This cannot be the reason for the slowdown. – Cris Luengo Jun 14 '19 at 12:58
  • @Cris Luengo: As long as each thread is assigned to a separate cpu-core you are right at least the 1st-level cache is separated. But on one hand your java threads are not always assigned to separate cores and on the other hand e.g. HiperThreading cores might share the cache see https://stackoverflow.com/a/54018751/1921273. In lower cache levels there is even less (for most CPUs no) separation between cores. – MrSmith42 Jun 18 '19 at 10:36