1

I was was trying various methods of making a large computation run in parallel. This worked as expected, except when explicitly creating and starting Thread instances, which was much slower. I added some println output to check that the threads really ran in parallel but just that change made it much quicker. The speed went from "over a second" to "tens of milliseconds", and the results were repeatable.

Extra weirdness: there was no slowness issue with ArrayList, and println made no obvious difference, but LinkedList was slow until I added some println.

I was running this on Windows 10 with Java 8. I also tried Java 11 and 12 and the slowness didn't happen (so pointless output couldn't make it faster).

My question isn't how to make it faster - I don't actually need to run this code, and I could make it parallel in another way (I recommend streams). My question is how is it possible to speed up a thread by printing some output or if you prefer why was it slow in the first place?

I tried varying the number of threads: the weirdness disappeared with one thread (!) but was present for 2 or more threads (I tested up to twice the number of cores on my machine).

I tried varying the size of the list being processed but only proved that smaller lists can be processed faster!

I tried varying the output: two println outputs seemed to increase the speed more than one, and printing at the end of a thread's life had less effect than at the beginning.

I made sure that it wasn't a GC effect (huge heap, and checked the GC log).

Here's the code I was running.

import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;

public class ListAddThreads {
    private static final int SIZE    = 20000000;
    private static final int REPEATS = 3;
    private static final int THREADS = Runtime.getRuntime().availableProcessors() / 2;

    private static String log = "none";

    public static void main( String[] args ) {
        if( args.length > 0 ) switch( args[0] ) {
            case "some":
            case "more":
            case "lots":
                log = args[0];
        }

        System.out.println( "Array List:" );
        List<Integer>listA = new ArrayList<>(SIZE);
        for( int i = 0; i < SIZE; i++ )
            listA.add( i );
        for( int i = 0; i < REPEATS; i++ )
            add( listA );

        System.out.println( "\nLinked List:" );
        List<Integer>listL = new LinkedList<>(listA);
        for( int i = 0; i < REPEATS; i++ )
            add( listL );
    }

    private static class AdderThread extends Thread {
        private long          result = 0;
        private List<Integer> list;
        private int           from;
        private int           to;

        AdderThread( List<Integer> list, int from, int to ) {
            this.list = list;
            this.from = from;
            this.to   = to;
        }

        long getResult() {
            return result;
        }

        public void run() {
            switch( log ) {
                case "some":
                    System.err.println( "string literal" );
                    break;
                case "more":
                    System.err.println( Thread.currentThread().getName() + " starting" );
                    break;
                case "lots":
                    System.err.println( "string literal" );
                    System.err.println( Thread.currentThread().getName() + " starting" );
                    break;
            }
            Iterator<Integer> it = list.iterator();
            for( int i = 0; i < to; i++ ) {
                Integer j = it.next();
                if( i >= from )
                    result += j;
            }
        }
    }

    private static void add( List<Integer> list ) {
        long          start  = System.currentTimeMillis();
        AdderThread[] adders = new AdderThread[THREADS];
        int           size   = list.size() / THREADS;

        for( int i = 0; i < THREADS; i++ ) {
            int from = i * size;
            int to   = i == THREADS - 1 ? list.size() : (i + 1) * size;

            adders[i] = new AdderThread( list, from, to );
        }
        for( int i = 0; i < THREADS; i++ ) {
            adders[i].start();
        }
        long result = 0;
        try {
            for( int i = 0; i < THREADS; i++ ) {
                adders[i].join();
                result += adders[i].getResult();
            }
        } catch ( InterruptedException x ) {
            throw new RuntimeException( "Adder thread interrupted.", x );
        }
        System.out.println( "Sum: " + result + ", time: " + (System.currentTimeMillis() - start) );
    }
}

I called it like this:

D:\weird> for %G in (none some more lots) do java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_%G_gc.log ListAddThreads %G 2>nul

Here's some example output which shows the LinkedList time getting quicker with more output (which is redirected to oblivion):

java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_none_gc.log ListAddThreads none  2>nul
Array List:
Sum: 199999990000000, time: 62
Sum: 199999990000000, time: 157
Sum: 199999990000000, time: 62

Linked List:
Sum: 199999990000000, time: 3343
Sum: 199999990000000, time: 1814
Sum: 199999990000000, time: 1796

java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_some_gc.log ListAddThreads some  2>nul
Array List:
Sum: 199999990000000, time: 50
Sum: 199999990000000, time: 58
Sum: 199999990000000, time: 56

Linked List:
Sum: 199999990000000, time: 2777
Sum: 199999990000000, time: 560
Sum: 199999990000000, time: 527

java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_more_gc.log ListAddThreads more  2>nul
Array List:
Sum: 199999990000000, time: 48
Sum: 199999990000000, time: 56
Sum: 199999990000000, time: 39

Linked List:
Sum: 199999990000000, time: 108
Sum: 199999990000000, time: 93
Sum: 199999990000000, time: 94

java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_lots_gc.log ListAddThreads lots  2>nul
Array List:
Sum: 199999990000000, time: 52
Sum: 199999990000000, time: 68
Sum: 199999990000000, time: 39

Linked List:
Sum: 199999990000000, time: 115
Sum: 199999990000000, time: 89
Sum: 199999990000000, time: 91
J Banana
  • 154
  • 6
  • 1
    See: [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Jacob G. Jun 16 '19 at 22:35
  • Did you test it with durations of 2-10 seconds? Thread Management and Setup takes some time, as longer it runs as less some short term influances affect on the time measurement. – User8461 Jun 16 '19 at 22:37
  • System.out.println(...) slow down the process. If you have a process you can check it, if you output all 100 ms, only 1 per second or only the final result. It will take more time for output as for running the calculation. e.g. iterate over a input searching for matching a hash value. – User8461 Jun 16 '19 at 22:40
  • 1
    That's a lot of code for what is essentially `int sum = listA.stream().mapToInt(Integer::intValue).sum();` – Elliott Frisch Jun 16 '19 at 22:47
  • @ElliottFrisch - I agree, and I recommended using streams in the question! My parallel stream version was `list.stream().parallel().mapToLong(a->a).sum();` – J Banana Jun 16 '19 at 23:01
  • @JacobG. - thanks for the link. I hadn't seen that, but now that I have seen it I don't have any changes to make (unless you can suggest some). – J Banana Jun 16 '19 at 23:04
  • @User8461 - you and I both expected `System.out.println` to slow it down but it made it **faster**. – J Banana Jun 16 '19 at 23:06
  • 1
    The change you should make is to use [JMH](https://openjdk.java.net/projects/code-tools/jmh/) to benchmark your code and see if you get the same results. – Slaw Jun 16 '19 at 23:19
  • in my case all values of Linked List are in range of 0 to 300. IDEA + Java11 – User8461 Jun 16 '19 at 23:21
  • @User8461 - thanks for your input. I also had no speed issues in Java 11 or 12 (and I mentioned this in the question) – J Banana Jun 16 '19 at 23:39
  • > My question is how is it possible to speed up a thread by printing some output => without System.out I get the same result. – User8461 Jun 16 '19 at 23:42
  • @User8461 - you said you're using Java 11. I've also tried that, and there is no problem on that version. Can you try it with Java 8? – J Banana Jun 17 '19 at 00:48
  • @Slaw - I tried SDH, and for `LinkedList` it says `9.416 ± 0.586 ops/s` with println and `2.982 ± 2.690 ops/s` without. If I understand that correctly, it's reporting the same thing I saw: performance is improved by adding println. The `ArrayList` performance is very little affected by println. – J Banana Jun 17 '19 at 00:53

0 Answers0