One idea is to employ fork/join algorithm and group items (files) into batches in order to process them individually.
My suggestion is the following:
- Firstly, filter out all files that do not exist - they occupy resources unnecessarily.
The following pseudo-code demonstrates the algorithm that might help you out:
public static class CustomRecursiveTask extends RecursiveTask<Integer {
private final Analyzer[] analyzers;
private final int threshold;
private final File[] files;
private final int start;
private final int end;
public CustomRecursiveTask(Analyzer[] analyzers,
final int threshold,
File[] files,
int start,
int end) {
this.analyzers = analyzers;
this.threshold = threshold;
this.files = files;
this.start = start;
this.end = end;
}
@Override
protected Integer compute() {
final int filesProcessed = end - start;
if (filesProcessed < threshold) {
return processSequentially();
} else {
final int middle = (start + end) / 2;
final int analyzersCount = analyzers.length;
final ForkJoinTask<Integer> left =
new CustomRecursiveTask(analyzers, threshold, files, start, middle);
final ForkJoinTask<Integer> right =
new CustomRecursiveTask(analyzers, threshold, files, middle + 1, end);
left.fork();
right.fork();
return left.join() + right.join();
}
}
private Integer processSequentially() {
for (int i = start; i < end; i++) {
File file = files[i];
for(Analyzer analyzer : analyzers) { analyzer.analyze(file) };
}
return 1;
}
}
And the usage looks the following way:
public static void main(String[] args) {
final Analyzer[] analyzers = new Analyzer[]{};
final File[] files = new File[] {};
final int threshold = files.length / 5;
ForkJoinPool.commonPool().execute(
new CustomRecursiveTask(
analyzers,
threshold,
files,
0,
files.length
)
);
}
Notice that depending on constraints you can manipulate the task's constructor arguments so that the algorithm will adjust to the amount of files.
You could specify different threshold
s let's say depending on the amount of files.
final int threshold;
if(files.length > 100_000) {
threshold = files.length / 4;
} else {
threshold = files.length / 8;
}
You could also specify the amount of worker threads in ForkJoinPool
depending on the input amount.
Measure, adjust, modify, you will solve the problem eventually.
Hope that helps.
UPDATE:
If the result analysis is of no interest, you could replace the RecursiveTask
with RecursiveAction
. The pseudo-code adds auto-boxing overhead in between.