I was was trying various methods of making a large computation run in parallel. This worked as expected, except when explicitly creating and starting Thread
instances, which was much slower. I added some println output to check that the threads really ran in parallel but just that change made it much quicker. The speed went from "over a second" to "tens of milliseconds", and the results were repeatable.
Extra weirdness: there was no slowness issue with ArrayList
, and println
made no obvious difference, but LinkedList
was slow until I added some println
.
I was running this on Windows 10 with Java 8. I also tried Java 11 and 12 and the slowness didn't happen (so pointless output couldn't make it faster).
My question isn't how to make it faster - I don't actually need to run this code, and I could make it parallel in another way (I recommend streams). My question is how is it possible to speed up a thread by printing some output or if you prefer why was it slow in the first place?
I tried varying the number of threads: the weirdness disappeared with one thread (!) but was present for 2 or more threads (I tested up to twice the number of cores on my machine).
I tried varying the size of the list being processed but only proved that smaller lists can be processed faster!
I tried varying the output: two println outputs seemed to increase the speed more than one, and printing at the end of a thread's life had less effect than at the beginning.
I made sure that it wasn't a GC effect (huge heap, and checked the GC log).
Here's the code I was running.
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
public class ListAddThreads {
private static final int SIZE = 20000000;
private static final int REPEATS = 3;
private static final int THREADS = Runtime.getRuntime().availableProcessors() / 2;
private static String log = "none";
public static void main( String[] args ) {
if( args.length > 0 ) switch( args[0] ) {
case "some":
case "more":
case "lots":
log = args[0];
}
System.out.println( "Array List:" );
List<Integer>listA = new ArrayList<>(SIZE);
for( int i = 0; i < SIZE; i++ )
listA.add( i );
for( int i = 0; i < REPEATS; i++ )
add( listA );
System.out.println( "\nLinked List:" );
List<Integer>listL = new LinkedList<>(listA);
for( int i = 0; i < REPEATS; i++ )
add( listL );
}
private static class AdderThread extends Thread {
private long result = 0;
private List<Integer> list;
private int from;
private int to;
AdderThread( List<Integer> list, int from, int to ) {
this.list = list;
this.from = from;
this.to = to;
}
long getResult() {
return result;
}
public void run() {
switch( log ) {
case "some":
System.err.println( "string literal" );
break;
case "more":
System.err.println( Thread.currentThread().getName() + " starting" );
break;
case "lots":
System.err.println( "string literal" );
System.err.println( Thread.currentThread().getName() + " starting" );
break;
}
Iterator<Integer> it = list.iterator();
for( int i = 0; i < to; i++ ) {
Integer j = it.next();
if( i >= from )
result += j;
}
}
}
private static void add( List<Integer> list ) {
long start = System.currentTimeMillis();
AdderThread[] adders = new AdderThread[THREADS];
int size = list.size() / THREADS;
for( int i = 0; i < THREADS; i++ ) {
int from = i * size;
int to = i == THREADS - 1 ? list.size() : (i + 1) * size;
adders[i] = new AdderThread( list, from, to );
}
for( int i = 0; i < THREADS; i++ ) {
adders[i].start();
}
long result = 0;
try {
for( int i = 0; i < THREADS; i++ ) {
adders[i].join();
result += adders[i].getResult();
}
} catch ( InterruptedException x ) {
throw new RuntimeException( "Adder thread interrupted.", x );
}
System.out.println( "Sum: " + result + ", time: " + (System.currentTimeMillis() - start) );
}
}
I called it like this:
D:\weird> for %G in (none some more lots) do java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_%G_gc.log ListAddThreads %G 2>nul
Here's some example output which shows the LinkedList
time getting quicker with more output (which is redirected to oblivion):
java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_none_gc.log ListAddThreads none 2>nul
Array List:
Sum: 199999990000000, time: 62
Sum: 199999990000000, time: 157
Sum: 199999990000000, time: 62
Linked List:
Sum: 199999990000000, time: 3343
Sum: 199999990000000, time: 1814
Sum: 199999990000000, time: 1796
java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_some_gc.log ListAddThreads some 2>nul
Array List:
Sum: 199999990000000, time: 50
Sum: 199999990000000, time: 58
Sum: 199999990000000, time: 56
Linked List:
Sum: 199999990000000, time: 2777
Sum: 199999990000000, time: 560
Sum: 199999990000000, time: 527
java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_more_gc.log ListAddThreads more 2>nul
Array List:
Sum: 199999990000000, time: 48
Sum: 199999990000000, time: 56
Sum: 199999990000000, time: 39
Linked List:
Sum: 199999990000000, time: 108
Sum: 199999990000000, time: 93
Sum: 199999990000000, time: 94
java -Xms4G -Xmx4G -XX:+PrintGCDetails -Xloggc:ListAddThreads_lots_gc.log ListAddThreads lots 2>nul
Array List:
Sum: 199999990000000, time: 52
Sum: 199999990000000, time: 68
Sum: 199999990000000, time: 39
Linked List:
Sum: 199999990000000, time: 115
Sum: 199999990000000, time: 89
Sum: 199999990000000, time: 91