0

During an assignment in school I came across this phenomenon which I cannot understand.

My task was to read two files and check wether they are exactly the same. I made two text files which contained the exact same line:

"Hello world"

I decided to check the text char by char. at first I wrote the following code:

EDIT: Due to many requests i've re-written the entire code to be displayed here:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
    char c1, c2;
    int ans = 1;
    FILE *f1 = fopen("text1.txt","rt");
    FILE *f2 = fopen("text2.txt","rt");
    for (fscanf(f1, "%c", &c1), fscanf(f2, "%c", &c2); 
        !feof(f1) && !feof(f2) && ans; 
        fscanf(f1, "%c", &c1), fscanf(f2, "%c", &c2))
    { // Check Data:
    if (c1 != c2) ans = 0; 
    printf("%c %c\n",c1,c2); // Print side by side check
    } // Check Tail:
    if (!feof(f1)) ans=0;
    if (!feof(f2)) ans=0;

    if (ans) printf("File 1 == File 2");
    else printf("File 1 != File 2");

    return 0;
}

but for some reason the code entered 'H' into c1 and 'e' into c2. Why does it work like that?

EDIT: i cannot seem to replicate the problem (this happened to me during a test i took in the university, thus i cannot access the original code anymore. the university is using an outdated Microsoft Visual Studio 2012 while i code using the 2015express version/netbeans)

HazirBot
  • 323
  • 2
  • 14
  • I'd ask `how` rather than `why`, _does it (even) work_? – Sourav Ghosh Dec 30 '15 at 07:13
  • 1
    Please see [Why is “while ( !feof (file) )” always wrong?](http://stackoverflow.com/q/5431941/2173917) – Sourav Ghosh Dec 30 '15 at 07:14
  • To add another angle, read lines using `fgets()`, strip newline, and use `strcmp()`, easier, cleaner. – Sourav Ghosh Dec 30 '15 at 07:16
  • 1
    @SouravGhosh: using `fgets`, it is not necessary to strip the newline before calling `strcmp` and stripping the newline would actually prevent detection of a file difference where one file ends with a `'\n'` and not the other. – chqrlie Dec 30 '15 at 07:29
  • @chqrlie I was taking the perspective of the visible content, anyway. :) – Sourav Ghosh Dec 30 '15 at 07:30
  • @SouravGhosh: *read two files and check wether they are exactly the same* probably refers to *actual* rather than *visible* content `:)` – chqrlie Dec 30 '15 at 07:32
  • @chqrlie Yes, maybe. In that case, you're very right. :) – Sourav Ghosh Dec 30 '15 at 07:33
  • @Giladmitrani I am wondering, why it didn't work for you. I checked the code execution (with gdb) and it seems working for me just fine. `fscanf()` is reading values from both the files correctly. – Pawan Dec 30 '15 at 08:19
  • Candidate reasons 1) Files not opened properly, 2) `c1,c2` are not the correct type. 3) Code does not check the result of `fscanf()` so why trust if `c1,c2` were written? 4) `c1, c2, ans` not initialized and whatever magic code/process used to determine "the code entered 'H' into c1 and 'e' into c2" is faulty. 5) Files not written correctly (Like one is using UTF-16). The list can go on. Posting a complete compilable code would help. – chux - Reinstate Monica Dec 30 '15 at 15:44
  • @Giladmitrani: How do know the values of `c1` and `c2`? – chqrlie Dec 30 '15 at 16:38
  • I've edited for a full compile-able working code with implemented visual tests. Thank you guys for taking a look at this, though i am unable to replicate the error – HazirBot Dec 30 '15 at 17:02
  • *i cannot seem to replicate the problem (this happened to me during a test i took in the university, thus i cannot access the original code anymore.* My guess is you were scanning both `c1` and `c2` from `f1` in the initial part of the `for` loop, a classic cut and paste bug. – chqrlie Dec 30 '15 at 20:57

1 Answers1

2

Looking at your code, I cannot see an explanation for the behavior you document. You should post a minimal complete verifiable example for use to see the rest of the function.

Your approach is not very effective and will fail to detect some cases where files differ: the way you test for end of file is approximative.

Here is an alternative using getc:

int c1, c2;
int identical = 1;

for (;;) {
    c1 = getc(f1);
    c2 = getc(f2);

    if (c1 != c2) {
        identical = 0;
        break;
    }
    if (c1 == EOF)
        break;
}

EDIT: after you posted more code, you concluded: i cannot seem to replicate the problem (this happened to me during a test i took in the university, thus i cannot access the original code anymore.

My guess is you were scanning both c1 and c2 from f1 in the initial part of the for loop, a classic cut and paste bug.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Please do not change this loop to a `do {} while` loop. Although it would be semantically equivalent, `do {} while` loops tend to be error prone. – chqrlie Dec 30 '15 at 09:40
  • 1
    I've checked other posts about `do {} while` loop but no one mentioned it's error prone. It just seemed the best suitable for this example. – seleciii44 Dec 30 '15 at 10:17
  • @seleciii44: I understand your surprise, maybe I should ask the question on SO, I have seen so many erroneous `do {} while` loops, especially for parsing files. It would be appropriate here, but I'd rather not condone it's use. – chqrlie Dec 30 '15 at 10:22
  • I understand your concern. Yet, it's like telling not to use `switch` or pointers. Because almost everyone does fail once when using these. I believe the best way to learn is by mistakes. – seleciii44 Dec 30 '15 at 10:41
  • @seleciii44 `switch` has its quirks, but nowhere as many bugs as `do {} while` loops tend to induce. Lets play a little game of *show me a do/while loop, I'll show you a bug*. Give me 2 random letters. – chqrlie Dec 30 '15 at 21:02
  • you really are serious :) ok. tell me what was wrong with my edit in your answer (by the way i really wanna learn/see if i'm missing something.) – seleciii44 Dec 30 '15 at 21:10
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/99343/discussion-between-chqrlie-and-seleciii44). – chqrlie Dec 30 '15 at 21:10