2

I am writing a program dealing with matrix parallel programming using Executorservice framework. And I set the fixedpoolsize to 4, however what surprises me is that when the matrix dimension is set to 5000, the speedup of using multithreading against serial execution is greater than 4 (which is also my CPU cores). And I have checked that my CPU does not support for hyperthreading.

Actually I use the Callable and Future container since my multithreading task requires the result to be returned.


// Part of code for parallel programming   

   double[][] x = new double[N][N];
    List<Future<double[]>> futureList = new ArrayList<>(); 
    for (int k=0;k<N;k++)
    {
        Future<double[]>temp=service.submit(new Thread.Task(N,k,matrix,vector));
        futureList.add(temp);  
    }
    for (int j = 0; j < N; j++) {
           x[j]=futureList.get(j).get(); 
    }

     public double[] call() throws Exception {
        for (int i = N - 1; i >= 0; i--)  
        {
            double sum = 0;
            for (int j = i + 1; j < N; j++)  
            {
                sum += matrix[i][j] * x[j];   
            }
            x[i] = (vector[i][k] - sum) / matrix[i][i]; 
        }
        return x;
    }

 // Part of code for Serial programming

    double[][] x = new double[N][N]; 
    for (int k=0;k<N;k++)
    {
        for (int i = N - 1; i >= 0; i--)  
        {
            double sum = 0;
            for (int j = i + 1; j < N; j++)  
            {
                sum += matrix[i][j] * x[j][k];   
            }
            x[i][k] = (vector[i][k] - sum) / matrix[i][i]; 
        }

    }

In short, I just take the inside loop away to let it be run by the thread and leave the outside loop unchanged.

But how can the speedup be like this?

Since from my previous concept it is that the maximum speedup can only be 4. And I have checked that the task is just done by 4 threads actually.

user3666197
  • 1
  • 6
  • 50
  • 92
CHEN2408
  • 35
  • 3
  • 1
    You wrote : "***speedup** of using multithreading against serial execution **is greater than 4***" - Q1) **How** did you measure the **speedup**? Q2) **Where** is a fully-reproducible MCVE-code to repeat such measurements again? – user3666197 Oct 30 '19 at 16:43
  • As pointed above, the first thing to do is to make sure it is not a measurement error. See https://stackoverflow.com/q/504103/829571 – assylias Oct 30 '19 at 17:14
  • I encapsulate the two programs into two function, and call them in my main program. Actually I just call one of them each time @user3666197 – CHEN2408 Oct 31 '19 at 04:18
  • Q2) **Where** is a **fully-reproducible MCVE-code** to repeat such measurements again? ( like completing this https://tio.run/##1VI9b4MwEN3zK26EEpJ0dkDq0E4VqtRsiMEBJzVfRsYkVFV@Oz2bhEBUqelYC9ngu3u8e@9SeqCuqFiZJlnXLZfwRqUCsYNYJAx2QkJFJc1zlkMlxV7SouDlHgBmM9wS0WxzFkZhBC14ULLj5SqI8CE6B155rdYvjWokW18KfB925kYHz5VPUtJPk@xbNgFTqwlYvFSQeSuSrQOSOY5tIl9m1@sWWrGi8momDzxmi7rZFlxZGn and posting the updated link with the fully-reproducible -- i.e. indeed working -- MCVE-code state ? ) – user3666197 Oct 31 '19 at 12:51

2 Answers2

0

Threads can be utilized on the same cpu. You do not need a multi core processor to execute multi threaded applications.

Think of a thread as small process, that gets created by the parent program and destroyed once its done. Even single cpu computers can run multiple threads at once.

ExecutorService schedules threads to be executed and will run as many parallel threads as there are resources available including the cores.

Here is the docs on fixedThreadPool

public static ExecutorService newFixedThreadPool(int nThreads)

Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until it is explicitly shutdown

You can also try workStealingPool

public static ExecutorService newWorkStealingPool()

Creates a work-stealing thread pool using all available processors as its target parallelism level.

locus2k
  • 2,802
  • 1
  • 14
  • 21
  • But if the operation is using 100% of the CPU, adding more threads won't speedup anything. – assylias Oct 30 '19 at 17:11
  • That is correct, but the OP didn't mention what his CPU was running at percentage wise. `ExecutorService` will schedule thread execution depending on current CPU utilization/resources available. – locus2k Oct 30 '19 at 17:22
  • Seeing that it's a pure math computation, I would expect the CPU to be fully utilised. – assylias Oct 30 '19 at 18:17
  • All depends on the CPU and resources available. It could utilize 100% or not. – locus2k Oct 30 '19 at 18:21
  • I check for the CPU that when the code comses to the multitheading part, the CPU utilization is nearly 100%, actually my confusion is that I only define 4 threads for 4 cores in my theadpool, but how can the speedup is more then 4? – CHEN2408 Oct 31 '19 at 04:00
0

This could be an effect of CPU cache affinity. If each core works on a different part of a problem it may achieve greater efficiency in cache use. Because RAM is up to 10's or more times slower than cache this can make a HUGE difference.

Tod Harter
  • 516
  • 5
  • 8
  • Is it the phenomenon called "superlinear speedup"? And I check for the potential reason that causes the superlinear speedup which is same as you just mentioned. – CHEN2408 Oct 31 '19 at 04:02
  • Yeah, of course it is pretty hard to say what the real reason is without doing some fairly rigorous analysis. Ideally you can use CPU tracing, or if you are oldschool a real hardware logic analyzer. Nothing beats that! :) – Tod Harter Jun 07 '21 at 21:13