15

I'm trying to write a function which compares the content of two files.

I want it to return 1 if files are the same, and 0 if different.

ch1 and ch2 works as a buffer, and I used fgets to get the content of my files.

I think there is something wrong with the eof pointer, but I'm not sure. FILE variables are given within the command line.

P.S. It works with small files with size under 64KB, but doesn't work with larger files (700MB movies for example, or 5MB of .mp3 files).

Any ideas, how to work it out?

int compareFile(FILE* file_compared, FILE* file_checked)
{
    bool diff = 0;
    int N = 65536;
    char* b1 = (char*) calloc (1, N+1);
    char* b2 = (char*) calloc (1, N+1);
    size_t s1, s2;

    do {
        s1 = fread(b1, 1, N, file_compared);
        s2 = fread(b2, 1, N, file_checked);

        if (s1 != s2 || memcmp(b1, b2, s1)) {
            diff = 1;
            break;
        }
      } while (!feof(file_compared) || !feof(file_checked));

    free(b1);
    free(b2);

    if (diff) return 0;
    else return 1;
}

EDIT: I've improved this function with the inclusion of your answers. But it's only comparing first buffer only -> but with an exception -> I figured out that it stops reading the file until it reaches 1A character (attached file). How can we make it work?

EDIT2: Task solved (working code attached). Thanks to everyone for the help!

knoxgon
  • 1,070
  • 2
  • 15
  • 31
Chris
  • 509
  • 2
  • 4
  • 15

6 Answers6

38

If you can give up a little speed, here is a C++ way that requires little code:

#include <fstream>
#include <iterator>
#include <string>
#include <algorithm>

bool compareFiles(const std::string& p1, const std::string& p2) {
  std::ifstream f1(p1, std::ifstream::binary|std::ifstream::ate);
  std::ifstream f2(p2, std::ifstream::binary|std::ifstream::ate);

  if (f1.fail() || f2.fail()) {
    return false; //file problem
  }

  if (f1.tellg() != f2.tellg()) {
    return false; //size mismatch
  }

  //seek back to beginning and use std::equal to compare contents
  f1.seekg(0, std::ifstream::beg);
  f2.seekg(0, std::ifstream::beg);
  return std::equal(std::istreambuf_iterator<char>(f1.rdbuf()),
                    std::istreambuf_iterator<char>(),
                    std::istreambuf_iterator<char>(f2.rdbuf()));
}

By using istreambuf_iterators you push the buffer size choice, actual reading, and tracking of eof into the standard library implementation. std::equal returns when it hits the first mismatch, so this should not run any longer than it needs to.

This is slower than Linux's cmp, but it's very easy to read.

akim
  • 8,255
  • 3
  • 44
  • 60
mtrw
  • 34,200
  • 7
  • 63
  • 71
  • 1
    @Zhang - if you use `istreambuf_iterator` you get one char at a time, yes. The internal implementation reads multiple characters at a time. If you look at https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/include/bits/streambuf.tcc for instance, it looks like there is a buffer copy, and the buffer size depends on the instantiated type. But I'm not all that experienced in looking at the internal implementations so you may want to research this further. – mtrw Apr 28 '20 at 15:08
11

Here's a C++ solution. It seems appropriate since your question is tagged as C++. The program uses ifstream's rather than FILE*'s. It also shows you how to seek on a file stream to determine a file's size. Finally, it reads blocks of 4096 at a time, so large files will be processed as expected.

// g++ -Wall -Wextra equifile.cpp -o equifile.exe

#include <iostream>
using std::cout;
using std::cerr;
using std::endl;

#include <fstream>
using std::ios;
using std::ifstream;

#include <exception>
using std::exception;

#include <cstring>
#include <cstdlib>
using std::exit;
using std::memcmp;

bool equalFiles(ifstream& in1, ifstream& in2);

int main(int argc, char* argv[])
{
    if(argc != 3)
    {
        cerr << "Usage: equifile.exe <file1> <file2>" << endl;
        exit(-1);
    }

    try {
        ifstream in1(argv[1], ios::binary);
        ifstream in2(argv[2], ios::binary);

        if(equalFiles(in1, in2)) {
            cout << "Files are equal" << endl;
            exit(0);
        }
        else
        {
            cout << "Files are not equal" << endl;
            exit(1);
        }

    } catch (const exception& ex) {
        cerr << ex.what() << endl;
        exit(-2);
    }

    return -3;
}

bool equalFiles(ifstream& in1, ifstream& in2)
{
    ifstream::pos_type size1, size2;

    size1 = in1.seekg(0, ifstream::end).tellg();
    in1.seekg(0, ifstream::beg);

    size2 = in2.seekg(0, ifstream::end).tellg();
    in2.seekg(0, ifstream::beg);

    if(size1 != size2)
        return false;

    static const size_t BLOCKSIZE = 4096;
    size_t remaining = size1;

    while(remaining)
    {
        char buffer1[BLOCKSIZE], buffer2[BLOCKSIZE];
        size_t size = std::min(BLOCKSIZE, remaining);

        in1.read(buffer1, size);
        in2.read(buffer2, size);

        if(0 != memcmp(buffer1, buffer2, size))
            return false;

        remaining -= size;
    }

    return true;
}
jww
  • 97,681
  • 90
  • 411
  • 885
  • why you don't check whether files exist or not ? and you only compare **4KB** of data from both files thats not enough thats why you should not use stack here instead use dynamic memory here and free that memory afterwards. – Haseeb Mir Nov 27 '18 at 18:48
  • @HaSeeBMiR - I think your analysis is not quite correct. For example, more than the first 4KB are verified since the read is happening in a loop. In fact the entire files are read because of the loop. – jww Nov 27 '18 at 22:55
  • But why you compare using buffer of BLOCKSIZE why you dont compare whole buffer of size1 at once with memcmp. – Haseeb Mir Nov 28 '18 at 16:28
10

When the files are binary, use memcmp not strcmp as \0 might appear as data.

George Kastrinis
  • 4,924
  • 4
  • 29
  • 46
9

Since you've allocated your arrays on the stack, they are filled with random values ... they aren't zeroed out.

Secondly, strcmp will only compare to the first NULL value, which, if it's a binary file, won't necessarily be at the end of the file. Therefore you should really be using memcmp on your buffers. But again, this will give unpredictable results because of the fact that your buffers were allocated on the stack, so even if you compare to files that are the same, the end of the buffers past the EOF may not be the same, so memcmp will still report false results (i.e., it will most likely report that the files are not the same when they are because of the random values at the end of the buffers past each respective file's EOF).

To get around this issue, you should really first measure the length of the file by first iterating through the file and seeing how long the file is in bytes, and then using malloc or calloc to allocate the buffers you're going to compare, and re-fill those buffers with the actual file's contents. Then you should be able to make a valid comparison of the binary contents of each file. You'll also be able to work with files larger than 64K at that point since you're dynamically allocating the buffers at run-time.

Jason
  • 31,834
  • 7
  • 59
  • 78
4

Switch's code looks good to me, but if you want an exact comparison the while condition and the return need to be altered:

int compareFile(FILE* f1, FILE* f2) {
  int N = 10000;
  char buf1[N];
  char buf2[N];

  do {
    size_t r1 = fread(buf1, 1, N, f1);
    size_t r2 = fread(buf2, 1, N, f2);

    if (r1 != r2 ||
        memcmp(buf1, buf2, r1)) {
      return 0;  // Files are not equal
    }
  } while (!feof(f1) && !feof(f2));

  return feof(f1) && feof(f2);
}
Awais Qarni
  • 17,492
  • 24
  • 75
  • 137
RobisonMD
  • 41
  • 2
3

Better to use fread and memcmp to avoid \0 character issues. Also, the !feof checks really should be || instead of && since there's a small chance that one file is bigger than the other and the smaller file is divisible by your buffer size..

int compareFile(FILE* f1, FILE* f2) {
  int N = 10000;
  char buf1[N];
  char buf2[N];

  do {
    size_t r1 = fread(buf1, 1, N, f1);
    size_t r2 = fread(buf2, 1, N, f2);

    if (r1 != r2 ||
        memcmp(buf1, buf2, r1)) {
      return 0;
    }
  } while (!feof(f1) || !feof(f2));

  return 1;
}
Switch
  • 5,126
  • 12
  • 34
  • 40