I have a huge byte array that needs to be processed. In theory, it should be possible to slice the work into even pieces and assign them to different threads to increase performance on a multi-core machine.
I allocated a ByteBuffer
for each thread and processed parts of the data each. The final performance is slower than with a single thread even though I have 8 logical processors. Also it is very inconsistent. Sometimes the same input is double as slow to process or more. Why is that? The data is loaded into the memory first so no more IO
operations are performed.
I allocate my ByteBuffers using MappedByteBuffer
because it's faster than ByteBuffer.wrap()
:
public ByteBuffer getByteBuffer() throws IOException
{
File binaryFile = new File("...");
FileChannel binaryFileChannel = new RandomAccessFile(binaryFile, "r").getChannel();
return binaryFileChannel.map(FileChannel.MapMode.READ_ONLY, 0, binaryFileChannel.size());
}
I do my concurrent processing using Executors
:
int threadsCount = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(threadsCount);
ExecutorCompletionService<String> completionService = new ExecutorCompletionService<>(executorService);
for (ByteBufferRange byteBufferRange : byteBufferRanges)
{
Callable<String> task = () ->
{
performTask(byteBufferRange);
return null;
};
completionService.submit(task);
}
// Wait for all tasks to finish
for (ByteBufferRange ignored : byteBufferRanges)
{
completionService.take().get();
}
executorService.shutdown();
The concurrent tasks performTask()
use their own ByteBuffer
instances to read memory from the buffer, do calculations and so on. They do not synchronize, write or influence each other. Any ideas what is going wrong or is this not a good case of parallelization?
The same problem exist with ByteBuffer.wrap()
and MappedByteBuffer
alike.