13

I know the answer is No, here is an example Why single thread is faster than multithreading in Java? .

So when processing a task in a thread is trivial, the cost of creating a thread will create more overhead than distributing the task. This is one case where a single thread will be faster than multithreading.

Questions

  • Are there more cases where a single thread will be faster than multithreading?

  • When should we decide to give up multithreading and only use a single thread to accomplish our goal?

Although the question is tagged , it is also welcome to discuss beyond Java. It would be great if we could have a small example to explain in the answer.

Community
  • 1
  • 1
JaskeyLam
  • 15,405
  • 21
  • 114
  • 149
  • Check out the disruptor pattern and the LMAX architecture. – Sotirios Delimanolis Dec 05 '14 at 15:38
  • What is the question?? – dube Dec 05 '14 at 15:39
  • 3
    `First of all, threads cannot speed up execution of code. They do not make the computer run faster. All they can do is increase the efficiency of the computer by using time that would otherwise be wasted. In certain types of processing this optimization can increase efficiency and decrease running time.` - http://programmers.stackexchange.com/questions/97615/what-can-multiple-threads-do-that-a-single-thread-cannot – Ricky Mutschlechner Dec 05 '14 at 15:48
  • The question that you link to is correct that there is some overhead. However the test is fallable. Since it is so simple the compiler is probably optimizing the single thread calls into inline code so it will definitely run faster. Typically you should write your code to be single threaded until some performance testing identifies areas for improvement or if you are working on a know problem like some sort of server architecture. – markbernard Dec 05 '14 at 15:52
  • You are talking about on a single cpu, right? Because, clearly, if something is embarrassingly parallel, and you have multiple cpus available, then it will be faster. On one cpu, hard to parallelize, then no. http://stackoverflow.com/questions/1223072/forcing-multiple-threads-to-use-multiple-cpus-when-they-are-available – John Powell Dec 05 '14 at 16:01
  • @JohnBarça , No, nowadays, it is very hard to one a computer with only one cpu, this is at the circumstances of multi-cpu/cores – JaskeyLam Dec 05 '14 at 16:03
  • 4
    Understanding [Amdah'ls Law](http://en.wikipedia.org/wiki/Amdahl%27s_law) should answer your question. – Jim Mischel Dec 05 '14 at 16:05
  • Sure, but you didn't state that. Anwyway, it still essentially comes down to the task at hand, and how easy it is to parallelize -- a lot of which will be done by the JVM. Just because you have multiple cores, doesn't necessarily mean they will be used, and there will be overhead in communication. – John Powell Dec 05 '14 at 16:06
  • @JohnBarça as the question says: if a task is very simple like printing 100 System.out.println("hello"); then obviously you should not go for multithreading, since this will certainly not be any faster. – javaHunter Dec 05 '14 at 16:07
  • Yes, Amdahl's law, more succinct than what I was saying. – John Powell Dec 05 '14 at 16:07
  • @javaHunter. The question says no such thing. It says trivial, but this is distinctly open to interpretation. – John Powell Dec 05 '14 at 16:10

7 Answers7

15

This is a very good question regarding threading and its link to the real work, meaning the available physical CPU(s) and its cores and hyperthreads.

  1. Multiple threads might allow you to do things in parallel, if your CPU has more than one core available. So in an ideal world, e.g. calulating some primes, might be 4 times faster using 4 threads, if your CPU has 4 cores available and your algorithm work really parallel.
  2. If you start more threads as cores are available, the thread management of your OS will spend more and more time in Thread-Switches and in such your effiency using your CPU(s) becomes worse.
  3. If the compiler, CPU cache and/or runtime realized that you run more than one thread, accessing the same data-area in memory, is operates in a different optimization mode: As long as the compile/runtime is sure that only one thread access the data, is can avoid writing data out to extenral RAM too often and might efficently use the L1 cache of your CPU. If not: Is has to activate semaphores and also flush cached data more often from L1/L2 cache to RAM.

So my lessons learned from highly parrallel multithreading have been:

  • If possible use single threaded, shared-nothing processes to be more efficient
  • If threads are required, decouple the shared data access as much as possible
  • Don't try to allocate more loaded worker threads than available cores if possible

Here a small programm (javafx) to play with. It:

  • Allocates a byte array of 100.000.000 size, filled with random bytes
  • Provides a method, counting the number of bits set in this array
  • The method allow to count every 'nth' bytes bits
  • count(0,1) will count all bytes bits
  • count(0,4) will count the 0', 4', 8' byte bits allowing a parallel interleaved counting

Using a MacPro (4 cores) results in:

  1. Running one thread, count(0,1) needs 1326ms to count all 399993625 bits
  2. Running two threads, count(0,2) and count(1,2) in parallel needs 920ms
  3. Running four threads, needs 618ms
  4. Running eight threads, needs 631ms

enter image description here enter image description here enter image description here enter image description here

Changing the way to count, e.g. incrementing a commonly shared integer (AtomicInteger or synchronized) will dramatically change the performance of many threads.

public class MulithreadingEffects extends Application {
    static class ParallelProgressBar extends ProgressBar {
        AtomicInteger myDoneCount = new AtomicInteger();
        int           myTotalCount;
        Timeline      myWhatcher = new Timeline(new KeyFrame(Duration.millis(10), e -> update()));
        BooleanProperty running = new SimpleBooleanProperty(false);

        public void update() {
            setProgress(1.0*myDoneCount.get()/myTotalCount);
            if (myDoneCount.get() >= myTotalCount) {
                myWhatcher.stop();
                myTotalCount = 0;
                running.set(false);
            }
        }

        public boolean isRunning() { return myTotalCount > 0; }
        public BooleanProperty runningProperty() { return running; }

        public void start(int totalCount) {
            myDoneCount.set(0);
            myTotalCount = totalCount;
            setProgress(0.0);
            myWhatcher.setCycleCount(Timeline.INDEFINITE);
            myWhatcher.play();
            running.set(true);
        }

        public void add(int n) {
            myDoneCount.addAndGet(n);
        }
    }

    int mySize = 100000000;
    byte[] inData = new byte[mySize];
    ParallelProgressBar globalProgressBar = new ParallelProgressBar();
    BooleanProperty iamReady = new SimpleBooleanProperty(false);
    AtomicInteger myCounter = new AtomicInteger(0);

    void count(int start, int step) {
        new Thread(""+start){
            public void run() {
                int count = 0;
                int loops = 0;
                for (int i = start; i < mySize; i+=step) {
                    for (int m = 0x80; m > 0; m >>=1) {
                        if ((inData[i] & m) > 0) count++;
                    }
                    if (loops++ > 99) {
                        globalProgressBar.add(loops);
                        loops = 0;
                    }
                }
                myCounter.addAndGet(count);
                globalProgressBar.add(loops);
            }
        }.start();
    }

    void pcount(Label result, int n) {
        result.setText("("+n+")");
        globalProgressBar.start(mySize);
        long start = System.currentTimeMillis();
        myCounter.set(0);
        globalProgressBar.runningProperty().addListener((p,o,v) -> {
            if (!v) {
                long ms = System.currentTimeMillis()-start;
                result.setText(""+ms+" ms ("+myCounter.get()+")");
            }
        });
        for (int t = 0; t < n; t++) count(t, n);
    }

    void testParallel(VBox box) {
        HBox hbox = new HBox();

        Label result = new Label("-");
        for (int i : new int[]{1, 2, 4, 8}) {
            Button run = new Button(""+i);
            run.setOnAction( e -> {
                if (globalProgressBar.isRunning()) return;
                pcount(result, i);
            });
            hbox.getChildren().add(run);
        }

        hbox.getChildren().addAll(result);
        box.getChildren().addAll(globalProgressBar, hbox);
    }


    @Override
    public void start(Stage primaryStage) throws Exception {        
        primaryStage.setTitle("ProgressBar's");

        globalProgressBar.start(mySize);
        new Thread("Prepare"){
            public void run() {
                iamReady.set(false);
                Random random = new Random();
                random.setSeed(4711);
                for (int i = 0; i < mySize; i++) {
                    inData[i] = (byte)random.nextInt(256);
                    globalProgressBar.add(1);
                }
                iamReady.set(true);
            }
        }.start();

        VBox box = new VBox();
        Scene scene = new Scene(box,400,80,Color.WHITE);
        primaryStage.setScene(scene);

        testParallel(box);
        GUIHelper.allowImageDrag(box);

        primaryStage.show();   
    }

    public static void main(String[] args) { launch(args); }
}
Jens-Peter Haack
  • 1,887
  • 13
  • 18
8

As already mentionened in a comment by @Jim Mischel, you can use

Amdahl's law

to calculate this. Amdahl's law states that the speedup gained from adding processors to solve a task is

enter image description here

where

N is the number of processors, and

P is the fraction of the code that can be executed in parallel (0 .. 1)

Now if T is the time it takes to execute the task on a single processor, and O is the total 'overhead' time (create and set up a second thread, communication, ...), a single thread is faster if

T < T/S(2) + O

or, after reordering, if

O/T > P/2

When the ratio Overhead / Execution Time is greater than P/2, a single thread is faster.

alain
  • 11,939
  • 2
  • 31
  • 51
  • I have a question on this. If a task is executed at one processor with time of T. While we distribute this task into two parts to two processors, the time cost by two processors should be less then T(maybe 1/2), so how could we use S(2) to * T? – JaskeyLam Dec 24 '14 at 03:24
  • @Jaskey Sorry for my late answer. You're right, there was an error, I fixed it now. – alain Jan 16 '15 at 16:50
5

Not all algorithms can be processed in parallel (algorithms that are strictly sequential; where P=0 in Amdahl's law) or at least not efficiently (see P-complete). Other algorithms are more suitable for parallel execution (extreme cases are called "embarrassingly parallel").

A naive implementation of a parallel algorithm can be less efficient in terms of complexity or space compared to a similar sequential algorithm. If there is no obvious way to parallelize an algorithm so that it will get a speedup, you may need to choose another similar parallel algorithm that solves the same problem but can be more or less efficient. If you ignore thread/process creation and direct inter-process communication overhead, there can still be other limiting factors when using shared resources like IO bottlenecks or increased paging caused by higher memory consumption.

When should we decide to give up multithreading and only use a single thread to accomplish our goal?

When deciding between single and multithreading, the time needed to change the implementation and the added complexity for developers should be taken into account. If there is only small gain by using multiple threads you could argue that the increased maintenance cost that are usually caused by multi-threaded applications are not worth the speedup.

kapex
  • 28,903
  • 6
  • 107
  • 121
  • Thank you! "even if you ignore all thread creation and communication overhead the serial algorithm in one thread could be still more efficient and faster." Could I have an example for this where single thread will be still faster even we ignore overhead of creating a thread and communicate among thread? – JaskeyLam Dec 08 '14 at 02:59
  • Thinking about it a bit more, saying "all communication" probably is too broad and not true. You could always argue that a trivial parallel implementation (like where one thread does all the work and the other threads don't contribute to the result in any way) is as just as fast if you ignore _all_ overhead. I've changed the answer a bit. – kapex Dec 08 '14 at 13:53
4

Threading is about taking advantage of idle resources to handle more work. If you have no idle resources, multi-threading has no advantages, so the overhead would actually make your overall runtime longer.

For example, if you have a collection of tasks to perform and they are CPU-intensive calculations. If you have a single CPU, multi-threading probably wouldn't speed that process up (though you never know until you test). I would expect it to slow down slightly. You are changing how the work is split up, but no changes in capacity. If you have 4 tasks to do on a single CPU, doing them serially is 1 * 4. If you do them in parallel, you'll come out to basically 4 * 1, which is the same. Plus, the overhead of merging results and context switching.

Now, if you have multiple CPU's, then running CPU-intensive tasks in multiple threads would allow you to tap unused resources, so more gets done per unit time.

Also, think about other resources. If you have 4 tasks which query a database, running them in parallel helps if the database has extra resources to handle them all. Though, you are also adding more work, which removes resources from the database server, so I probably wouldn't do that.

Now, let's say we need to make web service calls to 3 external systems and none of the calls have input dependent on each other. Doing them in parallel with multiple threads means that we don't have to wait for one to end before the other starts. It also means that running them in parallel won't negatively impact each task. This would be a great use case for multi-threading.

Brandon
  • 9,822
  • 3
  • 27
  • 37
3

The overhead may be not only for creation, but for thread-intercommunications. The other thing that should be noted that synchronization of threads on a, for example, single object may lead to alike single thread execution.

loshad vtapkah
  • 429
  • 4
  • 11
2

Are there more cases where a single thread will be faster than multithreading?

So in a GUI application you will benefit from multithreading. At the most basic level you will be updating the front end as well as what the front end is presenting. If you're running something basic like hello world then like you showed it would be more overhead.

That question is very broad... Do you count Unit Tests as applications? If so then there are probably more applications that use single threads because any complex system will have (hopefully) at least 1 unit test. Do you count every Hello world style program as a different application or the same? If an application is deleted does it still count?

As you can see I can't give a good response other than you would have to narrow the scope of your question to get a meaningful answer. That being said this may be a statistic out there that I'm unaware of.

When should we decide to give up multithreading and only use a single thread to accomplish our goal?

When multithreading will perform 'better' by whatever metric you think is important.

Can your problem be broken into parts that can be processed simultaneously? Not in a contrived way like breaking Hello World into two threads where one thread waits on the other to print. But in a way that 2+ threads would be able to accomplish the task more efficiently than one?

Even if a task is easily parallelizable doesn't mean that it should be. I could multithread an application that trolled thousands of new sites constantly to get me my news. For me personally this would suck because it would eat my pipe up and I wouldn't be able to get my FPS in. For CNN this might be exactly what they want and will build a mainframe to accommodate it.

Can you narrow your questions?

Carlos Bribiescas
  • 4,197
  • 9
  • 35
  • 66
1

There are 2 scenarios that can happen here :

  1. MultiThreading on Single Core CPU :

    1.1 When to use : Multithreading helps when tasks that needs parallelism are IO bound.Threads give up execution while they wait for IO and OS assign the time slice to other waiting threads. Sequential execution do not have the behavior - Multithreads will boost the performance.

    1.2 When Not to Use : When the tasks are not IO bound and mere a calculation of something, you might not want to go for multi threading since thread creation and context switching will negate the gain if any. - Multithreads will have least impact.

  2. MultiThreading in Multi Core CPU : Multi core can run as many threads as the number of core in CPU. This will surely have performance boost. But Running higher number of threads than the available cores will again introduce the thread context switching problem. - Multithreads will surely have impact.

Be aware : Also there is limit on adding/introducing number of threads in system. More context switches will negate overall gain and application slows down.

sunil bhardwaj
  • 259
  • 3
  • 6