I have a piece of program processing a lot of files, where for each files two things needs to be done: First, some piece of the file is read and processed, and then the resulting MyFileData
gets stored. The first part can be parallelized, the second can not.
Doing everything sequentially is very slow, as the CPU has to wait for the disk, then it works a bit, and then it issues another request, and waits again...
I did the following
class MyCallable implements Callable<MyFileData> {
MyCallable(File file) {
this.file = file;
}
public MyFileData call() {
return someSlowOperation(file);
}
private final File file;
}
for (File f : files) futures.add(executorService.submit(new MyCallable(f)));
for (Future<MyFileData> f : futures) sequentialOperation(f.get());
and it helped a lot. However, I'd like to improve two things:
The
sequentialOperation
gets executed in a fixed order instead of processing whatever result is available first. How can I change it?There are thousands of files to be processed and starting thousands of disk requests could lead to disk trashing. By using
Executors.newFixedThreadPool(10)
I've limited this number, however I'm looking for something better. Ideally it should be self-tuning, so that it works optimal on different computers (e.g., issues more requests when RAID and/or NCQ is available, etc.). I don't think it could be based on finding out the HW configuration, but measuring the processing speed and optimizing based on it should somehow be possible. Any idea?