Your logic is mostly sound. You are on the right track with your train of thought:
- Read a line into
previous
(a
).
- Read another line into
current
(b
).
- If
previous
and current
have the same contents, go to step 2.
- Print
previous
.
- Move
current
to previous
.
- Go to step 2.
This still has some problems, however.
Unnecessary line-read
To start, consider this bit of code:
while(fgets(b,6000,stdin)!=NULL) {
...
if(test==0) {
fgets(b,6000,stdin);
}
else {
printf("%s",a);
}
...
}
If a
and b
have the same contents (test==0
), you use an unchecked fgets
to read a line again, except you read again when the loop condition fgets(b,6000,stdin)!=NULL
is evaluated. The problem is that you're mostly ignoring the line you just read, meaning you're moving an unknown line from b
to a
. Since the loop already reads another line and checks for failure appropriately, just let the loop read the line, and invert the if
statement's equality test to print a
if test!=0
.
Where's the last line?
Your logic also will not print the last line. Consider a file with 1 line. You read it, then fgets
in the loop condition attempts to read another line, which fails because you're at the end of the file. There is no print statement outside the loop, so you never print the line.
Now what about a file with 2 lines that differ? You read the first line, then the last line, see they're different, and print the first line. Then you overwrite the first line's buffer with the last line. You fail to read another line because there aren't any more, and the last line is, again, not printed.
You can fix this by replacing the first (unchecked) fgets
with a[0] = 0
. That makes the first byte of a
a null byte, which means the end of the string. It won't compare equal to a line you read, so test==1
, meaning a
will be printed. Since there is no string in a
to print, nothing is printed. Things then continue as normal, with the contents of b
being moved into a
and another line being read.
Unique last line problem
This leaves one problem: the last line won't be printed if it's not a duplicate. To fix this, just print b
instead of a
.
The final recipe
- Assign
0
to the first byte of previous
(a[0]
).
- Read a line into
current
(b
).
- If
previous
and current
have the same contents, go to step 2.
- Print
current
.
- Move
current
to previous
.
- Go to step 2.
As you can see, it's not much different from your existing logic; only steps 1 and 4 differ. It also ensures that all fgets
calls are checked. If there are no lines in a file, nothing is printed. If there is only 1 line in a file, it is printed. If 2 lines differ, both are printed. If 2 lines are the same, the first is printed.
Optional: optimizations
- Instead of checking all 6000 bytes, you only check up to the first null byte in either string since
fgets
will automatically add one to mark the end of the string.
- Faster still would be to add a
break
statement inside the if
statement of your for
loop. If a single byte doesn't match, the entire line is not a duplicate, so you can stop comparing early—a lot faster if only byte 10 differs in two 1000-byte lines!