1

I am currently trying to write a program that reads two files one byte at a time (yes I am aware of the heavy I/O overhead), but I am having trouble incrementing the FILE pointer. I would like to program to compare both files byte-by-byte, and getc would not be a viable option for it would only work for chars because chars are one byte. However, I am reading from two text files and the text file could include numbers such as ints, doubles, etc. Therefore, in such scenario I would like to grab that byte from part of the int/double and compare it to the other file (a sequential byte-by-byte comparison).

Here is what I have so far:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include <time.h>

#define BUFFER_SIZE 1

unsigned char buffer1[BUFFER_SIZE];
unsigned char buffer2[BUFFER_SIZE];

int main()
{
    FILE *fp1, *fp2;
    int ch1, ch2;
    clock_t elapsed;
    char fname1[40], fname2[40];

    printf("Enter name of first file :");
    fgets(fname1, 40, stdin);
    while ( fname1[strlen(fname1) - 1] == '\n')
    {
        fname1[strlen(fname1) -1] = '\0';
    }

    printf("Enter name of second file:");
    fgets(fname2, 40, stdin);
    while ( fname2[strlen(fname2) - 1] == '\n')
    {
        fname2[strlen(fname2) -1] = '\0';
    }

    fp1 = fopen(fname1, "r");
    if ( fp1 == NULL )
    {
        printf("Cannot open %s for reading\n", fname1 );
        exit(1);
    }

    fp2 = fopen( fname2,  "r");
    if (fp2 == NULL)
    {
        printf("Cannot open %s for reading\n", fname2);
        exit(1);
    }

    elapsed = clock(); // get starting time

    /* Read in 256 8-bit numbers into the buffer */
    size_t bytes_read1 = 0;
    size_t bytes_read2 = 0;

    bytes_read1 = fread(buffer1, sizeof(unsigned char), BUFFER_SIZE, fp1); 
    bytes_read2 = fread(buffer2, sizeof(unsigned char), BUFFER_SIZE, fp2); 

    printf("%c + in buffer 1\n", *buffer1);
    printf("%c + in buffer 2\n", *buffer2);

    fclose ( fp1 ); // close files
    fclose ( fp2 );

    elapsed = clock() - elapsed; // elapsed time
    printf("That took %.4f seconds.\n", (float)elapsed/CLOCKS_PER_SEC);
    return 0;
}

I am assuming buffer1 and buffer2 are the content of the one byte being read? Would I have to convert them to a number to compare them? I was thinking I could do the comparison as follows

(buffer1 ^ buffer2) == 0 

Then that would mean they are equal based on the XOR bitwise operation

Thanks for your help in advance

humblebeast
  • 303
  • 3
  • 16
  • 1
    "getc would not be a viable option" -- You are very, very confused. How do you suppose "numbers such as ints, doubles, etc." are stored in a file? getc gets a byte and so does your fread but getc is a lot more efficient. `(buffer1 ^ buffer2) == 0` does the same as `buffer1 == buffer2` ... but both compare addresses, not bytes. – Jim Balter Jul 19 '14 at 18:31
  • @JimBalter more than one byte – humblebeast Jul 19 '14 at 18:33
  • "more than one byte" is just a sequence of bytes. – Jim Balter Jul 19 '14 at 18:34
  • "ch1 == EOF" will never be true ... and not just because you never set the variable. RTFM for how to detect EOF with fread. – Jim Balter Jul 19 '14 at 18:36
  • As I said, you are very very confused. Files are just sequences of bytes. The int will either be stored as a character string, or a sequence of 4 bytes (for 64-bit ints), depending on how you wrote it out. In any case, you compare the files byte by byte ... again, getc is no different from an fread of 1 byte. – Jim Balter Jul 19 '14 at 18:40
  • 2
    If file is binary, then `fopen(fp, "rb");` and `c = fgetc(fp);` will work for you. if not, use `fopen(fp, "r");` – ryyker Jul 19 '14 at 18:41
  • @JimBalter - Just a nit: sizeof(__int64) is 8 bytes. – ryyker Jul 19 '14 at 18:45
  • 2
    @ryyker Yeah, well, at least I knew it wasn't "two bytes". :-) I started programming in 1965 ... those damn ints keep getting bigger and bigger so I lose track ... – Jim Balter Jul 19 '14 at 18:47
  • "Suppose the first set of information being read from both files" -- the only "set of information" you are reading is bytes (chars ... same thing in C). If you want to read chars or numbers as units, then you need to PARSE the file. – Jim Balter Jul 19 '14 at 18:51
  • "So if getc reads in an int" -- getc reads a char or byte (same thing) ... your question says "Reading two files one byte at a time". If you want to do something different, explain what it is. – Jim Balter Jul 19 '14 at 18:53
  • Finally, see http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem ... you haven't told us what your files contain or what you're really trying to achieve. – Jim Balter Jul 19 '14 at 18:56
  • "getc gets a byte and so does your fread" -- sorry, my mistake ... your fread gets 128 *or fewer* bytes. But no, you can't compare them by XORing the buffer addresses, you have to compare them byte-by-byte. The system buffers disk data, stdio buffers on top of that, you're buffering on top of that ... unnecessary and inefficient. – Jim Balter Jul 19 '14 at 19:00
  • You should change your moniker ... it doesn't fit. I'm out of here ... good luck. – Jim Balter Jul 19 '14 at 19:03

1 Answers1

1

I have enjoyed the banter in the comments. Maybe time for an example.

Note: In a text file, an alpha character such as "a" will be interpreted as 'a' (97, or 0x61). A numeric character, such as "2" will be interpreted the same way, as '2' (50, or 0x32). A file is just a collection of alphanumeric, punctuation, or white-space characters that using fgetc(), can be looked at one character at a time.

Contrary to your assertion that fgetc() will not work for a byte by byte comparison, Here is a simple example that shows it does. Showing code using fgetc() with inputs and results for same contents files, and different contents files:

#include <ansi_c.h>//this is a collector of the ansi C headers.  Pick the one in your
                   //environment that work for you.
#include <limits.h>

#define FILE1 "C:\\dev\\play\\file1.txt"
#define FILE2 "C:\\dev\\play\\file2.txt"

BOOL CompareFileByteByByte(char *file1, char *file2);

int main(void)
{

    if(CompareFileByteByByte(FILE1, FILE2))
    {
        printf("Files are equal\n");
    }
    else
    {
        printf("Files are NOT equal\n");
    }


    return 0;
}

BOOL CompareFileByteByByte(char *file1, char *file2)
{
    FILE *fp1=0, *fp2=0;
    BOOL results = 0;

    int c1 = 0, c2 = 0;//note, even though getc reads one char from file, 
                       //it uses int as return to accomodate -1 (EOF)

    fp1 = fopen(FILE1, "r");
    fp2 = fopen(FILE2, "r");


    c1 = fgetc(fp1);
    c2 = fgetc(fp2);

    results = (c1 == c2);

    while((c1!=EOF) && (c2 != EOF) && results)
    {
        c1 = fgetc(fp1);
        c2 = fgetc(fp2);
        results = (c1 == c2);
    }

    return results; 
}

Given FILE1 FILE2: (both same)

Oringinal text...
...more text 123456
...more text 2.3456
...more text 3e12

Results: Files are equal

Given FILE1

Oringinal text...
...more text 123456
...more text 2.3456
...more text 3e12

And FILE2

Oringinal text...
...more text 123456
...more text 2.3456
...more text 4e12

Results: Files are NOT equal

ryyker
  • 22,849
  • 3
  • 43
  • 87
  • Thank you so much, most useful information I heard today – humblebeast Jul 19 '14 at 20:00
  • @humblebeast - after posting this answer, I looked at the recent history of your posts. It appears you have an interest at this point in comparing files. Have you seen these other approaches/discussions: ***[1](http://stackoverflow.com/a/20688284/645128)***, ***[2](http://www.dreamincode.net/forums/topic/236817-how-would-i-compare-two-files/)***, ***[3](http://objectmix.com/asm-x86-asm-370/166774-byte-byte-compare-duplicate-file-finder-killer.html)***. – ryyker Jul 19 '14 at 20:20