Multi-threading in Java (Parallel Code MUCH Slower than Serial)

Question

Four threads are generated and passed their range needed to loop a matrix in order to do some operation. Essentially, my desire is to take a for loop and break up the work by four threads.

GE_threaddiv t = new GE_threaddiv(k + 1,toPass + (k+1),k,A[k][k],"1");
        GE_threaddiv t2 = new GE_threaddiv(toPass + (k+1),toPass*2 + (k+1),k,A[k][k],"2");
        GE_threaddiv t3 = new GE_threaddiv(toPass*2 + (k+1),toPass*3 + (k+1),k,A[k][k],"3");
        GE_threaddiv t4 = new GE_threaddiv(toPass*3 + (k+1),toPass*4 + (k+1),k,A[k][k],"4");
        t.start();
        t2.start();
        t3.start();
        t4.start();
        try {
            t.join();
            t2.join();
            t3.join();
            t4.join();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

Each thread starts a for loop with the specified range a being the start and b being the end of the segment (passed in when the thread was made). A is a global matrix and temp is a value from A passed into the thread on creation.

public void run() 
    { 
        try
        { 
                        for(int j = a; j < b; j++) {
                            A[c][j] = A[c][j]/temp;
                        }

        } 
        catch (Exception e) 
        { 
            System.out.println ("Exception is caught"); 
        } 
    }

My implementation is working, however it is drastically slower (magnitudes) than if I were to run the for loop in serial. The larger the data set, the slower the time. The threads are confirmed to be running side by side. My guess is that the degradation in efficiency is coming from how each thread is involved with memory access. Any help would be greatly appreciated!

Your code is a little hard to follow. I'd recommend making the example a little more clear with the parameters being used here. — Tyler, Oct 19 '18 at 19:04
What does "drastically slower" mean? Show some numbers. How big is your array, or more importantly, what is value of `toPass`? Unless it's in the millions+, you're losing more time starting and stopping threads, than you gain by running code in parallel. Consider using a thread pool. — Andreas, Oct 19 '18 at 19:05
Creating and starting threads takes time. Unless each thread has a massive number of operations to perform, this cost will be greater than simply doing the whole thing sequentially. If you have to do that several times, you could gain time by reusing these 4 threads, but even then, parallelizing doesn't magically make things faster. — JB Nizet, Oct 19 '18 at 19:05
Is there something else in the `GE_threaddiv` class in addition to the `run()` method? — Mick Mnemonic, Oct 19 '18 at 19:06
Thanks for the suggestions guys! There is no additions to the run() method, that is it. toPass the the Array length / 4. It is modified as it is passed into the thread to be the correct offset from the other threads. — CuriousOne, Oct 19 '18 at 19:15
@CuriousOne it wouldn't give any advantage if you process a single matrix. If you process many of them, it would allow creating the threads once and only once, and reuse them for all matrices, instead of re-creating the for each matrix. But again, if your matrices are small, processing a matrix sequentially will till be faster than processing it in 4 threads. You still haven't told us their size. — JB Nizet, Oct 19 '18 at 19:21
"My guess is that the degradation in efficiency is coming from how each thread is involved with memory access." This is my suspect as well. Instead of passing in a shared matrix, have you tried splitting the matrix such that you can pass each thread it's own matrix of data to process, and then combine the results afterwards? This will allow the data to be pulled into the CPU cache instead of syncing with main memory after each operation. — Darth Android, Oct 19 '18 at 19:23
@DarthAndroid Interesting, I just put a timer around the calculation portions only, after the threads were created and as soon as they finished, and I am still getting magnitudes worse than from serial time. I believe you are right, and this is my main issue. I will try your recommendation. The only worry I have is that combining the data after will be just as intensive as if I am doing it on the fly. — CuriousOne, Oct 19 '18 at 19:29
@JBNizet Sorry about that, their sizes can get very large, up to 4k x 4k. The only issue is that the next row of calculations is dependent on the previous row, and there would need to be some way for all threads to apply their changes to each others matrices.. which is why I used a communal matrix to begin with. — CuriousOne, Oct 19 '18 at 19:32
So your actual code isn't the code you posted? The code you posted only divides values of a single row by a constant temp value. You would get more helpful answers if you posted a complete minimal example reproducing the actual issue you're facing. — JB Nizet, Oct 19 '18 at 19:38
I think you'll be surprised how slow calling out to main memory is. It effectively causes the instruction pipeline on every thread to stall for a large % of the time they're trying to do work. — Darth Android, Oct 19 '18 at 19:39
@JBNizet You are right, I apologize. I didn't think other portions of my code were relevant to why it was behaving slowly. But I still stand by that, I think the main issue here is that the threads are all pulling from main memory instead of cache, which provides me an explanation. So, thank you for all your input, I really appreciate it. — CuriousOne, Oct 19 '18 at 19:44
Are you guys aware of a way to utilize multi threading on a loop if the current iteration was dependent on previous iterations. I cannot see a way that would let all the threads pull from cache while at the end of each iteration, update each others matrices. — CuriousOne, Oct 19 '18 at 19:46

score 3 · Answer 1 · answered Oct 19 '18 at 19:10

3

java8 features can resolve your issue, Java8 parallel streams for multithreading are for performance and takes very less time. I think this link can help you a bit

list.parallelStream().forEach(element -> doWork(element));

[Java8 Multithreading]

answered Oct 19 '18 at 19:10

Juliet.K

95
11

I think it's also worth pointing out that the parallelizing may still be slower https://stackoverflow.com/questions/23170832/java-8s-streams-why-parallel-stream-is-slower – Tyler Oct 19 '18 at 19:18
But its still much more efficient than old java multithreading way – Juliet.K Oct 19 '18 at 19:20
1

@Juliet.K no. The threads used by parallel stream are no different from the "old java multithreading" threads. There's nothing magical about them. – JB Nizet Oct 19 '18 at 19:33
yes you are right there is nothing magical, but we can use all the cores and it increases the performance. I think there is a reason they introduce java parallel stream, Fork join pool for threads – Juliet.K Oct 19 '18 at 19:36
1

In this case, however, I am using the four cores available on my cpu. At least that was the impression I was under, with four cores, I can use four concurrent threads. – CuriousOne Oct 19 '18 at 19:37
@CuriousOne, yes you could – Juliet.K Oct 19 '18 at 19:40

Multi-threading in Java (Parallel Code MUCH Slower than Serial)

1 Answers1