0

I'm currently trying to make a program that compare 2 files and that show all the differences.

The problems I'm having are:

  • First line of the result doesn't show the first character.

  • The differences don't have right results.

I've two input files.

file.txt

AAA
BBB
CCC
DDD
EEE

file2.txt

AAA
111
BBB
222
333
CCC
DDD
EEE
444

The output (1st line is bugged) I'm getting is:

11
BBB
222
333
CCC

And the output (without the 1st line bug) I desire to get must be:

111
222
333
444

This is currently my code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int compare(char *fname1, char *fname2)
{
    FILE *fp1 = fopen(fname1, "r"); 
    FILE *fp2 = fopen(fname2, "r");
    int ch1, ch2;

    if (fp1 == NULL) 
    {
        printf("Can't open %s", fname1);
        exit(1);
    } 
    else if (fp2 == NULL)
    {
        printf("Can't open %s", fname2);
        exit(1);
    } 
    else
    {
        ch1 = getc(fp1);    
        ch2 = getc(fp2);

        while ((ch1 != EOF) && (ch2 != EOF) && (ch1 == ch2)) 
        {
            ch1 = getc(fp1);
            ch2 = getc(fp2);
        }

        if (ch1 == ch2)
        {
            printf("Same. \n");
        }
        else if (ch1 != ch2)
        {
            printf("Different strings:\n");

            while(!feof(fp1) && !feof(fp2))
            {
                fgets(fname1, ch1, fp1);
                fgets(fname2, ch2, fp2);

                if(strcmp(fname1, fname2) != 0)
                {
                    printf("%s", fname2);
                }
            }
        }
    }

    fclose(fp1);
    fclose(fp2);
    return 0;
}

And the main function:

int main(int argc, char *argv[])
{
    if (argc == 3){
        compare(argv[1], argv[2]);
    }else{
        printf("Usage: ./what file.txt file2.txt \n");
    }
    return 0;
}

Comparing file.txt and file2.txt or file2.txt and file.txt should give the same result.

Lord Rixuel
  • 1,173
  • 6
  • 24
  • 43
  • As for the first character - you are first reading the files char by char and comparing. Then you are reading the strings, if the characters are not the same. But the first character of the string was already read. This is why you don't get it. – Eugene Sh. May 20 '15 at 19:23
  • 1
    Why don't you use the linux `diff`? – Willem Van Onsem May 20 '15 at 19:24
  • "Show all differences" - so, if you have 10 lines in files A and B, and then in file B, you insert a line at index 5, do you consider all lines after and including 5 to be different, or just that one line? – user4520 May 20 '15 at 19:24
  • 1
    I would read both file linewise into arrays of strings and navigate those. Abstract from the file I/O which is cumbersome and algorithmically irrelevant. (Don't get me wrong -- I understand it's part of the assignment. But try to separate I/O from data processing. Later in life you'll be thankful if you have modular blocks which you can combine easily.) – Peter - Reinstate Monica May 20 '15 at 19:26
  • @szczurcio just that one line. – Lord Rixuel May 20 '15 at 19:26
  • So what are the difference between these strings: "ab", "ba" ? – Eugene Sh. May 20 '15 at 19:27
  • @CommuSoft Or Araxis merge? Because it's an exercise, I assume. – Peter - Reinstate Monica May 20 '15 at 19:27
  • @CommuSoft I prefer not rely too much on existing commands and not everyone have Linux. – Lord Rixuel May 20 '15 at 19:28
  • @EugeneSh. Difference between "ab" and "ba" is the order of the characters. For me, it's like different "word". (for example, "god" and "dog" are two different words too) – Lord Rixuel May 20 '15 at 19:31
  • @LordRixuel Not everyone has Linux, but most who don't can get Cygwin or other free tools. – Politank-Z May 20 '15 at 19:32
  • @LordRixuel You didn't get my question. Consider two files, one contains lines "aaa" and "bbb", and the second contains "bbb", "aaa". What should your program output? – Eugene Sh. May 20 '15 at 19:34
  • @EugeneSh. Output should show no difference. – Lord Rixuel May 20 '15 at 19:34
  • But it is totally not how you program works. – Eugene Sh. May 20 '15 at 19:35
  • @EugeneSh. For the moment, let's just say both files are already sorted. I know I didn't put any sorting function in my program yet just for simplicity. – Lord Rixuel May 20 '15 at 19:38
  • But even if they are. You are reading the strings from two files in parallel. So a single shift will render all of the future comparisons false. "Difference" has to be well and formally defined, and the algorithm for that definition should be implemented. – Eugene Sh. May 20 '15 at 19:40
  • You can look at the files as "sets" of words, and compute the set differences `A \ B` and `B \ A` which both might be not empty. But then you should decide what is your output. Just one of them? Both? In which order? – Eugene Sh. May 20 '15 at 19:44
  • I think the approach is rather naive. Difference may mean many things. If the example is not trivial you may end up with several options and then you need a metric do tell which is the best. – marom May 20 '15 at 20:05

2 Answers2

0

The first character you're missing is being digested by getc(), in the first loop. To fix that, try using only the second while() or seek back one char before starting the next while() loop

Adashi
  • 461
  • 1
  • 4
  • 8
0

Your two calls to fgets() are wrong. According to the fgets() documentation:

char * fgets ( char * str, int num, FILE * stream );

str: Pointer to an array of chars where the string read is copied.

num: Maximum number of characters to be copied into str (including the terminating null-character).

stream: Pointer to a FILE object that identifies an input stream. stdin can be used as argument to read from the standard input.

Trying to write into your programs's argv[] probably won't end well. At a minimum you'd want to do something like:

char string1[100];
char string2[100];

fgets(string1, 99, fp1);
fgets(string2, 99, fp2);

Also be aware that using feof() like this is probably the wrong thing to do.

As for your comparison problem...trying to do a file comparison and checking for added/deleted strings is not an easy problem. You can start at this question for some background.

Community
  • 1
  • 1
uesp
  • 6,194
  • 20
  • 15
  • 2
    Why did you specify the buffer length to be one less than it actually is in your `fgets` call? As stated in the documentation above, that count includes the null terminator. Also, don't use hardcoded buffer sizes, this can easily lead to bugs if that size is changed, use `sizeof`. – user4520 May 20 '15 at 19:34
  • Good point on the `-1` buffer sizes, I didn't realize that. I agree with you on the other point but didn't want to make the answer too long/complex. – uesp May 20 '15 at 20:08