1

My Java application needs the ability to compare two different files in the filesystem and decide if their binary contents are the same or not.

Here is my current code:

package utils;
import java.io.*;

class compare { 
    public static void main(String args[]) throws IOException {
        FileInputStream file1 = new InputStream(args[0]);
        FileInputStream file2 = new InputStream(args[1]);

        try {
            if(args.length != 2)
                throw (new RuntimeException("Usage : java compare <filetoread> <filetoread>"));
            while (true) {
                int a = file1.read();
                int b = file2.read();
                if (a==-1) { 
                    System.out.println("Both the files have same content"); 
                }
                else{
                    System.out.println("Contents are different");
                }
            }
        }
        catch (Exception e) {
            System.out.println("Error: " + e);
        }
    }
}

Any tips or suggestions regarding how to make the comparison function correctly would be appreciated.

James T Snell
  • 1,588
  • 1
  • 15
  • 28
  • Use FileUtils for this. Very easy to implement. Example here: http://www.avajava.com/tutorials/lessons/whats-a-quick-way-to-tell-if-the-contents-of-two-files-are-identical-or-not.html. I'd post this as a real answer, but someone got confused and thought it's not a real question. – James T Snell May 25 '17 at 20:43

3 Answers3

7

The simplest way is to read the contents into two strings e.g.

  FileInputStream fin =  new FileInputStream(args[i]);
  BufferedReader myInput = new BufferedReader(new InputStreamReader(fin));
  StringBuilder sb = new StringBuilder();
  while ((thisLine = myInput.readLine()) != null) {  
             sb.append(thisLine);
  }

, and perform a .equals() on these. Do you require more complex differencing capabilities ?

Ashish Aggarwal
  • 3,018
  • 2
  • 23
  • 46
Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • no, the simpler the better. can u tell me how to implement it? –  Oct 02 '09 at 16:25
  • I'm not the answer isn't pretty much in the above – Brian Agnew Oct 02 '09 at 16:26
  • If the files are too big, you could always read them in fixed-size chunks and compare a chunk at a time. Actually this would be a more efficient solution even if the files aren't big enough to blow memory, as once you find one character different, they must be different, and there's no point reading more. – Jay Oct 02 '09 at 16:46
  • Also, if the files are or could be binary, you probably should read them into a byte array rather than a String. – Jay Oct 02 '09 at 16:47
  • The question says "2 text files" – Brian Agnew Oct 02 '09 at 16:56
3
import java.io.*;

public class Testing {
public static void main(String[] args) throws java.io.IOException {

    BufferedReader bfr2 = new BufferedReader(new InputStreamReader(
            System.in));
    String s1 = "";
    String s2 = "", s3 = "", s4 = "";
    String y = "", z = "";

    File file1 = new File("args[0]");
    File file2 = new File("args[1]");

    BufferedReader bfr = new BufferedReader(new FileReader(file1));
    BufferedReader bfr1 = new BufferedReader(new FileReader(file2));

    while ((z = bfr1.readLine()) != null)
        s3 += z;

    while ((y = bfr.readLine()) != null)
        s1 += y;

    System.out.println();

    System.out.println(s3);

    if (s3.equals(s1)) {
        System.out.println("Content of both files are same");
    } else {

        System.out.println("Content of both files are not same");
    }
}
}
fredley
  • 32,953
  • 42
  • 145
  • 236
2

Read the contents of the files and use the MessageDigest class to create an MD5 hash of the contents of each file. Then compare the two hashes. This has the advantage of working for binary files as well.

Eric Petroelje
  • 59,820
  • 9
  • 127
  • 177
  • True, but isn't definitive. Two files could hash to the same value but be different. I suppose an MD5 hash is big enough and unpredictable enough that this is unlikely, but I'm always nervous of algorithms that work "most of the time". – Jay Oct 02 '09 at 16:45
  • 7
    A much bigger problem is that this method is fairly inefficient -- it always reads the entirety of both files, even if (for example) there's a difference in the first byte. Comparing blocks directly gives an 'early out', so you quit comparing as soon as you find a difference. Hashing is more useful if you're doing something like finding any duplicates across a file system, so you want to compare file X against tens of thousands of others. In this case, if you get a duplicate hash, you can compare the files themselves to verify that they're really the same. – Jerry Coffin Oct 02 '09 at 17:20