0

reading files from a directory and comparing the files in a nested for loop in java. first files compare but the rest says "not the same" even though it is same. I know I have to adjust something in my loop to prevent it from sending null values, any pointers

  File Directory = new File (location);
  File files[] = Directory.listFiles();

  for (File f : files)
  {   

     for (File g : files) 
     {
         br = new BufferedReader (new FileReader (f));
         while(( z = br.readLine()) != null)  s1+= z;

         br2 = new BufferedReader (new FileReader (g));   
         while ((y = br2.readLine()) != null) s2 += y;

         System.out.println();

      //     System.out.println(s1);   

      //   System.out.println(f.getName() + "=" + g.getName());

          if (s1.equals(s2)) {
         System.out.println(f.getName() + "=" + g.getName());
         System.out.println( "Content of both files are same");

     }
    else {
         System.out.println(f.getName() + "!=" + g.getName());
         System.out.println("Content of both files are not same"); 
     }

     }
Ravindra Gullapalli
  • 9,049
  • 3
  • 48
  • 70
user2119847
  • 41
  • 1
  • 4
  • 2
    Four answers already pointed out the bug, so I just want to comment on one pet peeve of mine: please don't use the `foreach` style loop here. It's lazy and is really inefficient. If you had files `f1, f2, f3` your code compares `f1` to `f1, f2, f3`, then `f2` to `f1, f2, f3` then `f3` to `f1, f2, f3`. 6 out of 9 comparisons were pointless. Either they compared a file against itself, or just switch `s1` and `s2`. The longer your list, the more bad comparisons you make. Replacing your loops with `for (int i=0; i – Windle Feb 28 '13 at 14:48

5 Answers5

4

You keep adding to the same s1 and s2, which means after the first couple of files you'll always have the previous files' contents in there. You'd probably want to clear them when you open the files.

Also, you should probably move the reading of f in the outer loop. There's no point reading it every time.

There are other ways to make this faster, for instance hashing the contents of each file, and then comparing hashes before actually starting to compare each pair of files, or more easily, comparing the sizes of the files first -- two files having different sizes (as given by e.g. f.length() and g.length()) will never be the same.

(Editing to answer comment)

If you want to delete one of two identical files, you might want to follow Windle's comment to make sure you never compare the same pair of files twice, and then you can always delete f if f and g are the same. To delete a file, use File.delete().

As for copying files, you can try looking at this: Standard concise way to copy a file in Java? To create the name of the destination file, you can use this constructor.

Community
  • 1
  • 1
Vlad
  • 18,195
  • 4
  • 41
  • 71
  • thanks for the in put.....what if I want to delete the file if there are the same and or copy the file to a different folder... can u please show both options – user2119847 Mar 01 '13 at 04:25
2

I don't see s1 and s2 declared anywhere. They shoud be declared inside the inner foreach loop. Else, you'll concatenate the contents ef every file into the ses Strings.

Your algorithm is very inefficient, and doesn't take into account new lines, but that's another story.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
1

I have tried modifying your solution to keep it correct and fast. Try this.

StringBuffer s1 = new StringBuffer();
StringBuffer s2 = new StringBuffer();
for (int i = 0 ; i < files.length ; i++ )
 {   
     File f = files[i];
     s1 = new StringBuffer();
     br = new BufferedReader (new FileReader (f));
     while(( z = br.readLine()) != null)  s1.append(z);


  for (int j = i+1 ; j < files.length ; j++ )
    {
File g = files[j]  ;         
     s2 = new StringBuffer();
     br2 = new BufferedReader (new FileReader (g));   
     while ((y = br2.readLine()) != null) s2.append(y);

     System.out.println(" ");

     if (s1.equals(s2)) {
        System.out.println(f.getName() + "=" + g.getName());
        System.out.println( "Content of both files are same");

        // To write file to a new directory pass the new path and the file as String to the method as given below.
        writeToFile(newPath, s2);

        // To delete the file use the below statement.
        g.delete();
                 }
    else {
        System.out.println(f.getName() + "!=" + g.getName());
       System.out.println("Content of both files are not same"); 
     }

 }



 private void writeToFile(String fileName, String data) throws IOException{
     FileWriter fstream = new FileWriter(fileName);
      BufferedWriter out = new BufferedWriter(fstream);
      out.write(data);
      out.flush();
      out.close();
}
user1760178
  • 6,277
  • 5
  • 27
  • 57
  • That still makes N comparisons when you only need N / 2. Why compare A with B and B with A? And it still use String concatenations in loops which will ruin the performance. – JB Nizet Feb 28 '13 at 14:49
  • I understand the problem with the string concatenation. But how can reduce this to N/2 comparisions? – user1760178 Feb 28 '13 at 15:30
  • The inner loop should start at the index of the outer loop + 1. Suppose the files are A, B and C. Your algorithm compares A with A (pointless: always same), A with B, A with C, B with A (pointless: already done), B with B (pointless: always same), B with C, C with A (pointless, already done), C with B (pointless: already done), C with C (pointless: always same). A better algorihm would compare A with B, A with C, and B with C. – JB Nizet Feb 28 '13 at 15:37
  • thanks a lot for the input.....what if I want to delete the file if there are the same and or copy the file to a different folder... can u please show both options – user2119847 Mar 01 '13 at 04:24
  • I have edited the answer with both the options. To delete or to rite to a different directory. – user1760178 Mar 01 '13 at 14:22
1

You keep adding lines to s1 and s2, once you hit the first files that are not the same s1 and s2 will never be the same again. In addition, there is no real need for this concatenation, why not just compare two lines at a time and break and return false on the first lines that are not equal?

Doron Manor
  • 576
  • 4
  • 5
0

Lets take this example. Suppose you have 4 files in the directory: A, B, C and D.

What your code is trying to do is to compare every file in the location with every file in the same directory.

Which means A is compared with A, B, C and D. B is compared with A, B, C and D and so on.

With this example, the only cases where files are found equal are when A is compared with A, B is compared with B..

So, out of total 16 comparisons happening here, 4 of them will result in files being equal, and for the rest they are marked as unequal.

As a result you should expect more of "not same" than "same" outputs.

  • This is incorrect. You missed the bug that `s1` and `s2` don't get reset when needed. That means that this "works" as long as the files are the same, but always returns false once there is a file that doesn't match the previous one. See all the other answers. – Windle Feb 28 '13 at 14:33
  • This explains why OP finds more "not same" outputs. OP had a problem in their understanding that they should expect only "same" outputs. Finding bugs is the next step, which others have already done well. –  Feb 28 '13 at 14:41
  • If the code worked as intended, using your example files, you would 4 "same" output at a *minimum*. If there were files that were actually duplicates, that number would increase. – Windle Feb 28 '13 at 14:51