0

My algorithm can find duplicate files comparing items one by one in the list.
For example, a list with 5 items:

if position[0] is equal to position[1], [2], [3], [4]
if position[1] is equal to position[0], [2], [3], [4]
if position[2] is equal to position[0], [1], [3], [4]
if position[3] is equal to position[0], [1], [2], [4]
if position[4] is equal to position[0], [1], [2], [3]

So imagine a list of 3000 items, it really takes several minutes.

//pathList is an arraylist list of all files path of a specific folder

 for (int pos = 0; pos < pathList.size(); pos++) {

       for (int i = 0; i < pathList.size(); i++) {

          if (pos != i) {
             if (FilesAreEqual(new File(pathList.get(pos)), new File(pathList.get(i)))) {
                if (!mDuplicateList.contains(pathList.get(pos)) && 
                       !mDuplicateList.contains(pathList.get(i)))

                          mDuplicateList.add(pathList.get(i));

                 }
              }
           }
         }
      pathList.remove(pos);
    }

Even comparing file size first, it's still slow

    boolean FilesAreEqual(File file1, File file2) {
            if (file1.length() != file2.length()) return false;
            try {
                if (!FileUtils.contentEquals(file1, file2)) return false;
            } catch (IOException e) {
                e.printStackTrace();
                return false;
            }
            return true;
   }

I want to do it faster, like some Play Store apps. I have no idea how they can scan more than 15000 files in seconds.
Can you tell me a way to scan a folder very fast like those apps or help me to improve my algorithm ?

  • You could try to use a [`java.nio.file.FileVisitor`](https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileVisitor.html), but I really don't know if it is fast enough for you and I think it may not be supported by older APIs. – deHaar Oct 11 '19 at 08:03
  • Can you check it : https://softwareengineering.stackexchange.com/questions/202639/finding-duplicate-files Other : https://www.google.com/search?q=java+fast+duplicate+file+finder&oq=java+fast+duplicate+file+finder&aqs=chrome..69i57.4295j0j4&sourceid=chrome&ie=UTF-8 You should try to get tired of the Main Thread. Thanks. – Arda Kazancı Oct 11 '19 at 08:41
  • You can possibly have a more efficient algorithm if you set "int i = pos + 1". You already checked the positions of 0 - 1 for example and don't need to check 1-0 again. For another approach, using hashmap might be the best solution, check here https://stackoverflow.com/questions/40038729/check-duplicate-file-content-using-java – Harvey Oct 11 '19 at 09:31
  • Ok guys, I will check all your answers. Thanks. – adsstopdev Oct 12 '19 at 03:37

0 Answers0