2

I'm writing a ray tracing program in Java and have implemented multithreading using Runnable interface. Each thread renders a portion of the 800 vertical lines. When using two threads, they will render 400 lines each. For 8 threads, 100 lines each, and so on.

My solution is currently working, but the render time doesn't decrease when more threads are working in parallel. My CPU has 8 threads, and the usage is not 100% when rendering on 8 threads.

class Multithread implements Runnable {
  Camera camera;
  CountDownLatch latch;
  ...

  //Constructor for thread
  Multithread(Scene s, Camera c, int thread, int threadcount, CountDownLatch cdl){
      camera = c;
      latch = cdl;
      ...
  }

  public void run(){
      try{
          ...
          //This is the render function
          camera.render(...);

          //When all threads unlatch, main class will write PNG
          latch.countDown();
      }
      catch (Exception e){System.out.println ("Exception is caught");}
  }
}
public class Camera {
    //The final pixel values are stored in the 2D-array
    ColorDbl[][] finalImage;
    
    Camera(int w){
        Width = w;
        finalImage = new ColorDbl[w][w]
    }

    //Start rendering
    void render(Scene S, int start, int end){

        //Create temporary, partial image
        ColorDbl[][] tempImage = new ColorDbl[Width][Width];

        Ray r;
        ColorDbl temp;
        //Render lines of pixels in the interval start-end
        for(int j = start; j < end; ++j){
            for(int i = 0; i < Width; ++i){
                r = new Ray(...);
                temp = r.CastRay(...);
                tempImage[i][j] = temp;
            }
        }

        //Copy rendered lines to final image
        for(int j=start; j<end; ++j){
            for(int i=0; i<Width; ++i){
                finalImage[i][j] = tempImage[i][j];
            }
        }
    }

    public static void main(String[] args) throws IOException{
        //Create camera and scene
        Camera camera = new Camera(800);
        Scene scene = new Scene();

        //Create threads
        int threadcount = 4;
        CountDownLatch latch = new CountDownLatch(threadcount);
        for (int thread=0; thread<threadcount; thread++){
            new Thread(new Multithread(scene, camera, thread, threadcount, latch)).start();
        }

        //Wait for threads to finish
        try{
          latch.await();
        }catch(InterruptedException e){System.out.println ("Exception");}

        //Write PNG
        c.write(...);
    }
}

When using 2 threads instead of 1, I expect almost a doubling of render speed, but instead, it takes 50% longer. I don't expect anyone to solve my issue, but I would really appreciate some guidance when it comes to implementing multithreading. Am I going about this the wrong way?

Karl A
  • 21
  • 5
  • A useful lesson: often-times multithreading does *not* speed processing speed. – Hovercraft Full Of Eels Jun 18 '19 at 00:23
  • Sometimes the overhead of managing multiple threads exceeds the performance gains of multithreading in the first place – EDToaster Jun 18 '19 at 00:44
  • You are probably right. But from my knowledge, Ray Tracing is a workload that should greatly benefit from multithreading if done right. Do you have any advice on how to find where the overhead is so that I can try and reduce it? – Karl A Jun 18 '19 at 08:27

2 Answers2

0

In the source code that you posted, I do not see an obvious bottleneck. When parallel code is running slower, the most common explanations are either overhead because of synchronization, or doing extra work.

When it comes to synchronization, high congestion can make parallel code run very slowly. It can mean threads (or processes) are fighting over limited resources (e.g., waiting for locks), but it can also be more subtle like accessing the same memory using atomic operations, which can become quite costly. In your example, I did not see anything like that. The only synchronization operation seem to be the countdown latches at the end, which should not be significant. Unequal workloads can also harm scalability, but it seems unlikely in your example.

Doing extra work could be an issue. Maybe you are copying more data in the parallel version than in the sequential one? That could explain some overhead. Another guess is that in the parallel version, the cache locality has been negatively impacted. Note that the effect of the cache is significant (as a rule of thumb, memory accesses can become a factor of 50-100 times slower when your workload no longer fits in the cache).

How to find your bottleneck? In general, that is called profiling. There are specialized tools, for example, VisualVM is a free tool for Java that can be used as a profiler. Another even simpler, but often very effective first approach is to run your program and take some random thread dumps. If you have an obvious bottleneck, it is likely that you will see it in the stack trace.

The technique is often called poor man's profiler, but I found it very effective (see this answer for more details). In addition, you can also apply it safely in production, so it is a neat trick when you have to optimize code that you cannot run on your local machine.

IDE's (like Eclipse or IntelliJ) have support to take Thread dumps but you can also trigger it directly from the command line if you know the process id:

 kill -3 JAVA_PID

The program (or the JVM that runs it) will then print the current stack trace of all current threads. If you repeat that a couple of times, you should get an idea where your program is spending most of its time.

You can also compare it with your sequential version. Maybe you notice some pattern that explains the overhead of the parallel version.

I hope that helped a bit to get started.

Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
  • 1
    Thanks a lot for the advice! I tried VisualVM (awesome tool!) and my threads are constantly blocked, but I don't know why and I don't understand the threaddump ...Thread.State: BLOCKED (on object monitor) at java.util.Vector$Itr.next(java.base@11.0.3/Vector.java:1275) - waiting to lock <...> (a java.util.Vector) at Scene.ObjectHit(Scene.java:112) at Scene.CastRay(Scene.java:147) at Scene.CastRay(Scene.java:206) at Scene.render(Scene.java:236) at Multithread.run(Multithread.java:16) at java.lang.Thread.run(java.base@11.0.3/Thread.java:834) Locked ownable synchronizers: - None – Karl A Jun 20 '19 at 00:34
0

I fixed the issue, and I finally understand why it didn't work.

By doing some debugging using VisualVM I noticed that all threads but one were blocked at all time. My initial workaround was to duplicate the Scene object that was passed to every thread. It solved the issue but it wasn't elegant and it didn't make sense to me. It turns out the real solution is much simpler.

I was using Vector<> as a container for the geometry in my scene class. Vector<> is a synchronized container that doesn't allow multiple threads to access it at the same time. By placing all the objects in the scene in an ArrayList<> instead, I get much cleaner code, less memory usage, and better performance.

VisualVM was critical for finding the blocking, and I thank Philipp Claßen for the advice since I would have never resolved this otherwise.

Karl A
  • 21
  • 5
  • Good to know that it has worked now. However, having to keep a separate copy of a resource for each thread just to avoid locks seems to be a problem, specially if the resource is large (e.g. a large scene to be rendered). Since we are just reading scene data, we are not writing to it, would it be possible to avoid the JVM from creating locks over this resource specifically? (I am new to Java also) –  Sep 21 '19 at 16:58