3

My program is trying to sum a range with a given number of threads in order to run it in parallel but it seems that with just one threads it runs better than 4 (I have an 8 core CPU). It is my first time working with multithreading in Java so maybe I have a problem in my code that makes it take longer?

My benchmarks(sum of range 0-10000) done for the moment are:

1 thread: 1350 microsecs (average) 2 thread: 1800 microsecs (average) 4 thread: 2400 microsecs (average) 8 thread: 3300 microsecs (average)

Thanks in advance!

/*
Compile: javac RangeSum.java
Execute: java RangeSum nThreads initRange finRange
*/

import java.util.ArrayList;
import java.util.concurrent.*;

public class RangeSum implements Runnable {

private int init;
private int end; 
private int id;
static public int out = 0;

Object lock = new Object();

public synchronized static void increment(int partial) {
    out = out + partial;
}

public RangeSum(int init,int end) { 
    this.init = init;
    this.end = end;
}//parameters to pass in threads

// the function called for each thread
public void run() {
    int partial = 0;

    for(int k = this.init; k < this.end; k++)
    {
        partial = k + partial + 1;
    }
    increment(partial);
}//thread: sum its id to the out variable

public static void main(String args[]) throws InterruptedException {
    final long startTime = System.nanoTime()/1000;//start time: microsecs

    //get command line values for
    int NumberOfThreads = Integer.valueOf(args[0]);
    int initRange = Integer.valueOf(args[1]);
    int finRange = Integer.valueOf(args[2]);
    //int[] out = new int[NumberOfThreads];

    // an array of threads
    ArrayList<Thread> Threads = new ArrayList<Thread>(NumberOfThreads);

    // spawn the threads / CREATE
    for (int i = 0; i < NumberOfThreads; i++) {
        int initial = i*finRange/NumberOfThreads;
        int end = (i+1)*finRange/NumberOfThreads;
        Threads.add(i, new Thread(new RangeSum(initial,end)));
        Threads.get(i).start();
    }

    // wait for the threads to finish / JOIN
    for (int i = 0; i < NumberOfThreads; i++) {
        try {
            Threads.get(i).join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    System.out.println("All threads finished!");

    System.out.println("Total range sum: " + out);

    final long endTime = System.nanoTime()/1000;//end time
    System.out.println("Time elapsed: "+(endTime - startTime));
}
}
  • 1
    Possible duplicate of [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Jacob G. Apr 20 '18 at 21:40
  • I need all the threads to be finished before getting the solution so I use join them at the end, shouldn't be like this? – Sergi Olives Orfila Apr 20 '18 at 21:48
  • If I remove the join, it basically executes the first thread and finishes the program without executing the rest of the threads. Also, I create them but also execute before making the join. – Sergi Olives Orfila Apr 20 '18 at 21:57
  • Starting a thread takes time. Your code runs so fast that the overhead of starting a thread exceeds the time to calculate. – Andreas Apr 20 '18 at 21:58
  • At least you should post your input values and the timing you get. A small table with different input values would be good. – algrid Apr 20 '18 at 22:00
  • @HovercraftFullOfEels Was code changed without first 5 minutes? Because current code (i.e. *only* code as far as I can see) starts N threads, which will run in parallel, and main code then waits for them to complete. Nothing pointless about that. – Andreas Apr 20 '18 at 22:00
  • @nitnamby Theory was that allowing your multi-core CPU to process different parts of the "range" in parallel would return result *faster* than a single thread having to do all the work. – Andreas Apr 20 '18 at 22:09
  • Edit: I was misreading the code, thinking that he was joining all threads to each other. My apologies. – Hovercraft Full Of Eels Apr 20 '18 at 22:12
  • The idea is a basic program to understand multithreading, it is a uni assignment. That's why it could seem not a relevant and useful code but it should work in parallel after all. – Sergi Olives Orfila Apr 20 '18 at 22:14
  • @SergiOlivesOrfila I think that this is a good exercise which shows you that throwing more threads / cores onto a problem may do more harm than good. – David Soroko Apr 20 '18 at 22:49
  • @SergiOlivesOrfila Please take a look at the java naming conventions. Variables should always be spelled lower case. So `NumberOfThreads` should be `numberOfThreads`. – Neuron Apr 20 '18 at 23:26

1 Answers1

1

Your workload entirely in memory-non-blocking computation - on a general principle, in this kind of scenario, a single thread will complete the work faster than multiple threads.

Multiple threads tend to interfere with the L1/L2 CPU caching and incur additional overhead for context switching

Specifically, wrt to your code, you initialize final long startTime = System.nanoTime()/1000; too early and measure thread setup time as well as the actual time it takes them to complete. Its probably better to setup your Threads list first and then:

final long startTime =...
for (int i = 0; i < NumberOfThreads; i++) {
    Thread.get(i).start()
}

but really, in this case, the expectations that multiple threads will improve processing time is not warranted.

David Soroko
  • 8,521
  • 2
  • 39
  • 51
  • Could you elaborate a bit on that? Those threads are just doing some computations without accessing memory too much. Why would it mess up caching? And since the number of those threads is less than the number of cpus no significant context switch overhead should appear. That looks like a perfect fit for parallelization – algrid Apr 20 '18 at 22:53
  • The content of an array lives on the heap, to perform a calculation the CPU needs to fetch values from the heap and act on them. This is relatively slow so CPUs have local cahes which can be accessed much faster this is often combined with prefetching where chunks of the data are fetched (once) into a CPU cache so that calculations don't need to wait for data from the heap. See here https://en.wikipedia.org/wiki/CPU_cache and here https://en.wikipedia.org/wiki/Cache_prefetching for more details – David Soroko Apr 20 '18 at 23:07
  • What array are you talking about? It’s just adding numbers from 1 to n. – algrid Apr 20 '18 at 23:10
  • Btw. my first thought when I saw the question was also that it sums an array :) – algrid Apr 20 '18 at 23:19