Some background: I created a contrived example to demonstrate use of VisualVM to my team. In particular, one method had an unnecessary synchronized
keyword, and we saw threads in the thread pool blocking, where they didn't need to be. But removing that keyword had the surprising effect described below, and the code below is the simplest case I can reduce that original example to in order to reproduce the issue, and using a ReentrantLock
also creates the same effect.
Please consider the code below (full runnable code example at https://gist.github.com/revbingo/4c035aa29d3c7b50ed8b - you need to add Commons Math 3.4.1 to the classpath). It creates 100 tasks, and submits them to a thread pool of 5 threads. In the task, two 500x500 matrices of random values are created, and then multiplied.
public class Main {
private static ExecutorService exec = Executors.newFixedThreadPool(5);
private final static int MATRIX_SIZE = 500;
private static UncorrelatedRandomVectorGenerator generator =
new UncorrelatedRandomVectorGenerator(MATRIX_SIZE, new StableRandomGenerator(new JDKRandomGenerator(), 0.1d, 1.0d));
private static ReentrantLock lock = new ReentrantLock();
public static void main(String[] args) throws Exception {
for(int i=0; i < 100; i++) {
exec.execute(new Runnable() {
@Override
public void run() {
double[][] matrixArrayA = new double[MATRIX_SIZE][MATRIX_SIZE];
double[][] matrixArrayB = new double[MATRIX_SIZE][MATRIX_SIZE];
for(int j = 0; j< MATRIX_SIZE; j++) {
matrixArrayA[j] = generator.nextVector();
matrixArrayB[j] = generator.nextVector();
}
RealMatrix matrixA = MatrixUtils.createRealMatrix(matrixArrayA);
RealMatrix matrixB = MatrixUtils.createRealMatrix(matrixArrayB);
lock.lock();
matrixA.multiply(matrixB);
lock.unlock();
}
});
}
}
}
The ReentrantLock
is actually unnecessary. There is no shared state between the threads that needs synchronization. With the lock in place, we expectedly observe the threads in the thread pool blocking. With the lock removed, we expectedly observe no more blocking, and all threads able to run fully in parallel.
The unexpected result of removing the lock is that the code consistently takes longer to complete, on my machine (quad-core i7) by 15-25%. Profiling the code shows no indication of any blocking or waiting in the threads, and total CPU usage is only around 50%, spread relatively evenly over the cores.
The second unexpected thing is that this is also dependent on the type of generator
that is used. If I use a GaussianRandomGenerator
or UniformRandomGenerator
instead of the StableRandomGenerator
, the expected result is observed - the code runs faster (by around 10%) by removing the lock()
.
If threads are not blocking, the CPU is at a reasonable level, and there is no IO involved, how can this be explained? The only clue I really have is that the StableRandomGenerator
does invoke a lot of trigonometric functions, so is clearly a lot more CPU intensive than the Gaussian or Uniform generators, but why then am I not seeing the CPU being maxed out?
EDIT: Another important point (thanks Joop) - making generator
local to the Runnable (i.e. one per thread) displays the normal expected behaviour, where adding the lock slows the code by around 50%. So the key conditions for the odd behaviour are a) using a StableRandomGenerator
, and b) having that generator be shared between the threads. But to the best of my knowledge, that generator is thread-safe.
EDIT2: Whilst this question is superficially very similar to the linked duplicate question, and the answer is plausible and almost certainly a factor, I'm yet to be convinced it's quite as simple as that. Things that make me question it:
1) The problem is only shown by synchronizing on the multiply()
operation, which does not make any calls to Random
. My immediate thought was that that synchronization ends up staggering the threads to some extent, and therefore "accidentally" improves the performance of Random#next()
. However, synchronizing on the calls to generator.nextVector()
(which in theory has the same effect, in the "proper" way), does not reproduce the issue - synchronizing slows the code as you might expect.
2) The problem is only observed with the StableRandomGenerator
, even though the other implementations of NormalizedRandomGenerator
also use the JDKRandomGenerator
(which as pointed out is just a wrapped for java.util.Random
). In fact, I replaced use of the RandomVectorGenerator
with filling in the matrices with direct calls to Random#nextDouble
, and behaviour again reverts to the expected result - synchronizing any part of the code causes the total throughput to drop.
In summary, the issue can only be observed by
a) using StableRandomGenerator
- no other subclass of NormalizedRandomGenerator
, nor using the JDKRandomGenerator
or java.util.Random
directly, display the same behaviour.
b) synchronizing the call to RealMatrix#multiply
. The same behaviour is not observed when synchronizing the calls to the random generator.