3

After reading about ForkJoinPool, I tried an experiment to test how fast actually ForkJoinPool is, when compared to plain recursion.

I calculated the number of files in a folder recursively, and to my surprize, plain recursion performed way better than ForkJoinPool

Here's my code.

Recursive Task

class DirectoryTask extends RecursiveTask<Long> {

    private Directory directory;

    @Override
    protected Long compute() {
        List<RecursiveTask<Long>> forks = new ArrayList<>();
        List<Directory> directories = directory.getDirectories();
        for (Directory directory : directories) {
            DirectoryTask directoryTask = new DirectoryTask(directory);
            forks.add(directoryTask);
            directoryTask.fork();
        }
        Long count = directory.getDoumentCount();
        for (RecursiveTask<Long> task : forks) {
            count += task.join();
        }
        return count;
    }
}

Plain Recursion

private static Long getFileCount(Directory directory) {
        Long recursiveCount = 0L;
        List<Directory> directories = directory.getDirectories();
        if (null != directories) {
            for (Directory d : directories) {
                recursiveCount += getFileCount(d);
            }
        }
        return recursiveCount + directory.getDoumentCount();
    }

Directory Object

class Directory {

    private List<Directory> directories;
    private Long doumentCount = 0L;

    static Directory fromFolder(File file) {
        List<Directory> children = new ArrayList<>();
        Long documentCount = 0L;
        if (!file.isDirectory()) {
            throw new IllegalArgumentException("Only directories are allowed");
        }
        String[] files = file.list();
        if (null != files) {
            for (String path : files) {
                File f = new File(file.getPath() + File.separator + path);
                if (f.isHidden()) {
                    continue;
                }
                if (f.isDirectory()) {
                    children.add(Directory.fromFolder(f));
                } else {
                    documentCount++;
                }
            }
        }
        return new Directory(children, documentCount);
    }
}

Results

  • Plain Recursion: 3ms
  • ForkJoinPool: 25ms

Where's the mistake here?

I am just trying to understand whether there is a particular threshold, below which plain recursion is faster than a ForkJoinPool.

Rachit Sachdeva
  • 135
  • 1
  • 1
  • 5

2 Answers2

8

Nothing in life comes for free. If you had to move one beer crate from your car to your apartment - what is quicker: carrying it there manually, or first going to the shed, to get the wheelbarrow to use that to move that one crate?

Creating thread objects is a "native" operation that goes down into the underlying Operating System to acquire resources there. That can be a rather costly operation.

Meaning: just throwing "more threads" at a problem doesn't automatically speed things up. To the contrary. When your task is mainly CPU-intensive, there might be small gain from doing things in parallel. When you are doing lots of IO, then having multiple threads allows you to do "less" waiting overall; thus improving your throughput.

In other words: Fork/Join requires considerable activity before it does the real job. Using it for computations that only require a few ms is simply overkill. Thus: you would be looking to "fork/join" operations that work on larger data sets.

For further reading, you might look at parallel streams. Those are using the fork/join framework under the covers; and surprise, it is a misconception to expect arbitrary parallelStream to be "faster" than ordinary streams, too.

Community
  • 1
  • 1
GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • Thanks a lot for the clarification.So, what I understand from both the answers here (one from Stephen C), ForkJoinPool is benificial when the task being performed actually takes up some time to complete, so as to compensate for the overhead of thread creation. – Rachit Sachdeva Apr 16 '17 at 05:40
  • Correct. Beyond that: you also have to be careful how/what you benchmark. It would not surprise me if execution time changes when you repeat your experiments multiple times. – GhostCat Apr 16 '17 at 07:15
5

There are multiple aspects to this:

  1. Is there a difference between serial (e.g. plain recursion) and parallel (e.g. forkjoin) solutions to the same problem?

  2. What is the scope for parallelizing file system access?

  3. What are the traps for measuring performance?

Answer to #1. Yes there is a difference. Parallelism is not good for a problem that is too small. With a parallel solution, you need to account for the overheads of:

  • creating and managing threads
  • passing information from the parent thread to the child threads
  • returning results from the child threads to the parent thread
  • synchronized access to shared data structures,
  • waiting for the slowest / last finishing child thread to finish.

How these play out in practice depend on a variety of things ... including the size of the problem, and the opportunities for parallelism.

The answer to #2 is (probably) not as much as you would think. A typical file system is stored on a disk drive that has physical characteristics such as disk rotation and disk head seeking. These typically become the bottleneck, and less you have a high-end storage system, there is not much scope for parallelism.

The answer to #3 is that there are lots of traps. And those traps can result in very misleading (i.e. invalid) performance results .... if you don't allow for them. One of the biggest traps is that JVMs take time to "warm up"; i.e. load classes, do JIT compilation, resize the heap, and so on.

Another trap that applies to benchmarks that do file system I/O is that a typical OS will do things like caching disk blocks and file / directory metadata. So the second time you access a file or directory it is likely to be faster than the first time.


Having said this, if you have a well-designed, high performance file system (e.g. inodes held on SSDs) and a well designed application, and enough cores, it is possible to achieve extraordinary rates of file system scanning through parallelism. (For example, checking the modification timestamps on half a billion files in under 12 hours ....)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216