0

I've decided to write a recursive program that writes all the files in my C drive into a .txt file, however it is very slow.

I've read online that recursion is slow, but i can't think of any other way. Is there any way i can optimize this ?

EDIT : changed the deepInspect method to use a Stack instead of recursion, which slightly improved performance.

Here is the code

public class FileCount {

static long fCount = 0;

public static void main(String[] args) {
    System.out.println("Start....");
    long start = System.currentTimeMillis();
    File cDir = new File("C:\\");
    inspect(cDir);
    System.out.println("Operation took : " + (System.currentTimeMillis() - start) + " ms");
}

private static void inspect(File cDir) {
    for (File f : cDir.listFiles()) {
        deepInspect(f);
    }
}

private static void deepInspect(File f) {
    Stack<File> stack = new Stack<File>();
    stack.push(f);
    while (!stack.isEmpty()) {
        File current = stack.pop();
        if (current.listFiles() != null) {
            for (File file : current.listFiles()) {
                stack.push(file);
            }
        }
        writeData(current.getAbsolutePath());
    }
}

static FileWriter writer = null;

private static void writeData(String absolutePath) {
    if (writer == null)
        try {
            writer = new FileWriter("C:\\Collected\\data.txt");
        } catch (IOException e) {}
    try {
        writer.write(absolutePath);
        writer.write("\r\n");//nwline
        writer.write("Files : " + fCount);
        writer.write("\r\n");
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

}
Vlad
  • 31
  • 6
  • Slow compared to what? I suspect that the bottleneck here is all the filesystem reads. It takes time to work through the hundreds of thousands of directory entries on a typical system. – slim Sep 22 '17 at 15:58
  • Your approach to re-using the FileWriter is a bit hacky, but at least you *do* re-use the FileWriter, so there should be no significant slowdown there. – slim Sep 22 '17 at 16:01
  • FileIO operations tend to be pretty expensive. right now you are creating a new FileWriter for every file. I would try storing all the filepaths in a List of Strings and then write them all at once at the end. Otherwise you should at least be reusing your FileWriter instead of constantly making a new one. – Steve Sep 22 '17 at 16:02
  • @Steve at first glance I thought he was opening a new FileWriter each time, but look, actually it's maintained as a static variable with an `if writer==null` around its initialiser. Naive, but it does the job. – slim Sep 22 '17 at 16:04
  • 1
    I suspect if you did a `C:\ > dir /s > Collected\data.txt` it would take around the same time. – slim Sep 22 '17 at 16:05
  • You may find you need to check each folder in case it is a symbolic link. – OldCurmudgeon Sep 22 '17 at 16:09
  • Related: https://stackoverflow.com/questions/159590/way-to-go-from-recursion-to-iteration -- but although it's not *false* to say that "recursion is slower than iteration", there's nuance to it and I think this program will be able to process the information faster than the disk can supply it. – slim Sep 22 '17 at 16:12
  • 1
    Final comment on this - any time you're wondering "Can I optimise this", you should *profile* to find where it's spending its time. – slim Sep 22 '17 at 16:16
  • @slim i tested the writing speed and for ~20 seconds i was able to write 1.2 GB of Data (iterating from 1 to 50 mil and writing to file), for the same time, the program writes no more than 5-6 KB of data(if using recursion), so i do beleive that it is the recursive method that slows it down – Vlad Sep 22 '17 at 17:36
  • @Vlad it's not the small output file that's the problem. It's the large amount of directory information that it's reading from all over the filesystem. – slim Sep 22 '17 at 19:26
  • @slim Thanks ! Now i'll know what to (try to) optimize. – Vlad Sep 22 '17 at 19:44

2 Answers2

0

Java 8 provides a stream to process all files.

Files.walk(Paths.get("/"))
    .filter(Files::isRegularFile)
    .forEach(System.out::println);

You could add "parallel" processing for improved performance

Files.walk(Paths.get("/"))
    .parallel()
    .filter(Files::isRegularFile)
    .forEach(System.out::println);

I tried this under linux, so you would need to replace "/" with "C:" and try it. Besides in my case stops when I try to read I don't have access, so you would need to check that too if you are not running as admin.

Check this out

Sebastian D'Agostino
  • 1,575
  • 2
  • 27
  • 44
  • Thanks for your suggestion, but im doing this for fun and practise, so i wanted to avoid using any methods that do it for me :) – Vlad Sep 22 '17 at 19:03
  • OK. Then just check the benchmarks I mentioned (https://github.com/brettryan/io-recurse-tests). I don't think you can improve that very much unless you use parallel processing because you have heavy IO throughput. – Sebastian D'Agostino Sep 22 '17 at 19:08
  • Parallel processing won't help I'd disk IO is the bottleneck. Indeed it might make things worse on a mechanical disk, as the processes fight over the disk head position. – slim Sep 22 '17 at 19:51
  • @slim That's not true. I/O operations are blocking ones, so using parallel processing allows you to save on that latency. Check the link on my response that backs my thoughts with the benchmarks. – Sebastian D'Agostino Sep 23 '17 at 21:26
0

I don't think the recursion is an issue here. The main issue in your code is the File IO which you are doing at every level. The disk access is extremely costly w.r.t the memory access. If you profile your code you should definitely see huge spike in the disk IO.

So, essentially you want to reduce the disk I/O. To do so you could have a in memory finite size Buffer where you can write the output and when the buffer is full flush the data to the file.

This however considerable more amount of work.

Avishek Bhattacharya
  • 6,534
  • 3
  • 34
  • 53