0

I have used recursion to search for particular type of file (for example .pdf files is used here). My recursion algorithm searches for all subfolder. However I found that it lacks performance when there is too many sub-folder. sub-sub-folder, sub-sub-sub-folder. I want to know if there is better algorithm for file searching.

Below is my recursion code for file searching. I have used .pdf file as an example

import java.io.File;
public class FInd {
    public static void main(String[] args) {
        File f = new File("D:/");
        find(f);    
    }
    public static void find(File f){    
        File []list = f.listFiles();
        try{
            for(int i=0;i<list.length && list.length>0;i++){    
                if(list[i].isFile() && (list[i].getName().contains(".pdf")) ||
                        list[i].getName().contains(".PDF"))
                    System.out.println(list[i].getAbsolutePath());
                if(list[i].isDirectory()) find(list[i]);
            }
        }catch(Exception e){    
        }
    }
}

This code is somewhat faster or equal to when compared to search option in file explorer. I want to know any faster algorithm than this

Kangkan Lahkar
  • 267
  • 5
  • 16
  • 3
    *Recursion* is not an algorithm, its an *implementation* choice. It seems that you have a search space and you need to explore it to find the file. So, unless there is a name-wise relationship between folders, you need to explore the whole space. – Arash Apr 05 '17 at 16:32
  • 1
    http://stackoverflow.com/questions/4852531/find-files-in-a-folder-using-java – prasanth Apr 05 '17 at 16:32
  • Use Files.walkFileTree if you are using jdk7 or more https://docs.oracle.com/javase/tutorial/essential/io/find.html – Elias Apr 05 '17 at 16:38
  • 2
    Possible duplicate of [Better file search algorithm than creating a list of files](http://stackoverflow.com/questions/16775723/better-file-search-algorithm-than-creating-a-list-of-files) – halileohalilei Apr 05 '17 at 16:40
  • I don't think changing search pattern (aka algorithm) will make a difference to the overall performance of scanning entire directory hierarchy. Performance is entirely constrained by disk performance for actually *reading* the directory structure. No way to improve that, other than replacing drive with SSD. – Andreas Apr 05 '17 at 16:41
  • Running a defragger such as [PerfectDisk](http://www.raxco.com/home/products/perfectdisk-pro) to collect directory entries at beginning of disk might make future scans faster, but that's overkill for this purpose. – Andreas Apr 05 '17 at 16:46

4 Answers4

2

the problem with threading is that launching them has a cost, so the increase in file browsing + recursion has to be better than the additional cost of N folders/threads.

This is a simple method that uses a loop (the classical replacement for recursion)

static boolean avoidRecursion(String target){
    File currentDir = new File(System.getProperty("user.home"));
    Stack<File> dirs = new Stack<File>();
    dirs.push(currentDir);

    do{
        for(File f : dirs.pop().listFiles()){
            if (f.isDirectory())
                dirs.push(f);
            else{
                if (f.getName().equals(target))
                    return true;
            }
        }
    }while(!dirs.isEmpty());
    return false;
}

Measure both approaches and choose the option that is faster

bichito
  • 1,406
  • 2
  • 19
  • 23
2

try the iterative way

public class Find {
public static void main(String[] args) {

  File f = new File("D:/");

  Stack stack = new Stack<File>();

  stack.push(f);

  while (!stack.empty())
  {    
      f = (File) stack.pop();
      File []list = f.listFiles();
      try{
          for(int i=0;i<list.length && list.length>0;i++){    
              if(list[i].isFile() && (list[i].getName().contains(".pdf")) ||
                      list[i].getName().contains(".PDF"))
                  System.out.println(list[i].getAbsolutePath());
              if(list[i].isDirectory()) stack.push(list[i]);
          }
      }catch(Exception e){    
  }
}
Abdou
  • 330
  • 1
  • 9
  • Thank you. This was helpful. Just small correction : **f = stack.pop(); File []list = f2.listFiles();** should be replaced with f = (File) stack.pop(); File []list = f.listFiles(); – Kangkan Lahkar Apr 06 '17 at 09:14
  • I'm glad it was helpful. I will edit the the answer (I have no Idea why I put `f2` instead of `f` :p). – Abdou Apr 06 '17 at 09:58
1

Probaply you could use multithreading...

Each folder you enter, you start at new thread... Even if you have more threads than your CPU, it ist not a Problem since Windows Can run much more threads...

  • 3
    Unlikely to improve performance. More likely to *reduce* performance, since performance bottleneck is reading from disk, and trying to read from two places on disk at the same time will just kill performance. *(disclaimer: assuming non-SSD disk)* – Andreas Apr 05 '17 at 16:39
  • Also, you should be very careful and limit the number of threads. Creating thousands of threads is likely to hang up your entire OS. – Jaroslaw Pawlak Apr 05 '17 at 16:44
  • @JaroslawPawlak you are right, but keep in mind, that the thread stops after finishing an folder :)... Mostly, folders are not so long (File content) which why threads are closing really fast too... – mirisbowring Apr 05 '17 at 20:54
  • But He could also create an specific amount of threads and handle theese... – mirisbowring Apr 05 '17 at 20:54
1

Use the Files.walk() method which returns a Java8 Stream. You can parallelize that calculation quite easily by using a parallel stream.

Use the following convenient idiom in a try with resources method:

try(Stream vals = Files.walk(rootPath)){ .... }

In the rootPath, you could use Paths.get("root location") to actually get to the root location.

Avneet Paul
  • 293
  • 1
  • 7