So basically I needed to optimize this piece of code today. It tries to find the longest sequence produced by some function for the first million starting numbers:
public static void main(String[] args) {
int mostLen = 0;
int mostInt = 0;
long currTime = System.currentTimeMillis();
for(int j=2; j<=1000000; j++) {
long i = j;
int len = 0;
while((i=next(i)) != 1) {
len++;
}
if(len > mostLen) {
mostLen = len;
mostInt = j;
}
}
System.out.println(System.currentTimeMillis() - currTime);
System.out.println("Most len is " + mostLen + " for " + mostInt);
}
static long next(long i) {
if(i%2==0) {
return i/2;
} else {
return i*3+1;
}
}
My mistake was to try to introduce multithreading:
void doSearch() throws ExecutionException, InterruptedException {
final int numProc = Runtime.getRuntime().availableProcessors();
System.out.println("numProc = " + numProc);
ExecutorService executor = Executors.newFixedThreadPool(numProc);
long currTime = System.currentTimeMillis();
List<Future<ValueBean>> list = new ArrayList<Future<ValueBean>>();
for (int j = 2; j <= 1000000; j++) {
MyCallable<ValueBean> worker = new MyCallable<ValueBean>();
worker.setBean(new ValueBean(j, 0));
Future<ValueBean> f = executor.submit(worker);
list.add(f);
}
System.out.println(System.currentTimeMillis() - currTime);
int mostLen = 0;
int mostInt = 0;
for (Future<ValueBean> f : list) {
final int len = f.get().getLen();
if (len > mostLen) {
mostLen = len;
mostInt = f.get().getNum();
}
}
executor.shutdown();
System.out.println(System.currentTimeMillis() - currTime);
System.out.println("Most len is " + mostLen + " for " + mostInt);
}
public class MyCallable<T> implements Callable<ValueBean> {
public ValueBean bean;
public void setBean(ValueBean bean) {
this.bean = bean;
}
public ValueBean call() throws Exception {
long i = bean.getNum();
int len = 0;
while ((i = next(i)) != 1) {
len++;
}
return new ValueBean(bean.getNum(), len);
}
}
public class ValueBean {
int num;
int len;
public ValueBean(int num, int len) {
this.num = num;
this.len = len;
}
public int getNum() {
return num;
}
public int getLen() {
return len;
}
}
long next(long i) {
if (i % 2 == 0) {
return i / 2;
} else {
return i * 3 + 1;
}
}
Unfortunately, the multithreaded version worked 5 times slower than the single-threaded on 4 processors (cores).
Then I tried a bit more crude approach:
static int mostLen = 0;
static int mostInt = 0;
synchronized static void updateIfMore(int len, int intgr) {
if (len > mostLen) {
mostLen = len;
mostInt = intgr;
}
}
public static void main(String[] args) throws InterruptedException {
long currTime = System.currentTimeMillis();
final int numProc = Runtime.getRuntime().availableProcessors();
System.out.println("numProc = " + numProc);
ExecutorService executor = Executors.newFixedThreadPool(numProc);
for (int i = 2; i <= 1000000; i++) {
final int j = i;
executor.execute(new Runnable() {
public void run() {
long l = j;
int len = 0;
while ((l = next(l)) != 1) {
len++;
}
updateIfMore(len, j);
}
});
}
executor.shutdown();
executor.awaitTermination(30, TimeUnit.SECONDS);
System.out.println(System.currentTimeMillis() - currTime);
System.out.println("Most len is " + mostLen + " for " + mostInt);
}
static long next(long i) {
if (i % 2 == 0) {
return i / 2;
} else {
return i * 3 + 1;
}
}
and it worked much faster, but still it was slower than the single thread approach.
I hope it's not because I screwed up the way I'm doing multithreading, but rather this particular calculation/algorithm is not a good fit for parallel computation. If I change calculation to make it more processor intensive by replacing method next
with:
long next(long i) {
Random r = new Random();
for(int j=0; j<10; j++) {
r.nextLong();
}
if (i % 2 == 0) {
return i / 2;
} else {
return i * 3 + 1;
}
}
both multithreaded versions start to execute more than twice as fast than the singlethreaded version on a 4 core machine.
So clearly there must be some threshold that you can use to determine if it is worth to introduce multithreading and my question is:
What is the basic rule that would help decide if a given calculation is intensive enough to be optimized by running it in parallel (without spending effort to actually implement it?)