3

I would like to copy binary file source to file target. Nothing more! The code is inspired from many examples found on the Internet.

#include <stdio.h>

int main(int argc, char **argv) {

    FILE *fp1, *fp2;
    char ch;

    fp1 = fopen("source.pdf", "r");
    fp2 = fopen("target.pdf", "w");

    while((ch = fgetc(fp1)) != EOF)
        fputc(ch, fp2);

    fclose(fp1);
    fclose(fp2);

    return 0;

}

The result differs in file size.

root@vm:/home/coder/test# ls -l
-rwxr-x--- 1 root root 14593 Feb 28 10:24 source.pdf
-rw-r--r-- 1 root root   159 Mar  1 20:19 target.pdf

Ok, so what's the problem?

I know that char is unsigned and get signed when above 80. See here.

This is confirmed when I use printf("%x\n", ch); which returns approximately 50% of the time something like sometimes FFFFFFE1.

The solution to the my issue would be to use int i.s.o. char.

Examples found with char: example 1, example 2 example 3, example 4, ...

Examples found with int: example a, ...

I don't use fancy compiler options.

Why are virtually all code examples found returning fgetc() to an char i.s.o. an int, which would be more correct?

What am I missing?

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
geohei
  • 696
  • 4
  • 15
  • 3
    `int ch` - change `char ch` to this. – user2736738 Mar 02 '18 at 05:55
  • @coderredoc, yes, I know this works, but how come there is so much code out there using `char`. How come the code using `char` works at all? – geohei Mar 02 '18 at 05:58
  • 1
    I will write an answer on this if I get time. – user2736738 Mar 02 '18 at 06:00
  • 1
    You're missing the point that most code out there on the internet (indeed most *anything* out there on the internet) is crap :-) – paxdiablo Mar 02 '18 at 06:00
  • You could just use `fread` and `fwrite`, you know. – Daniel Kamil Kozar Mar 02 '18 at 06:01
  • @Daniel Kamil Kozar, I know, but your solution eats more memory. I have to deal with huge files. My solution takes more time, I know. My initial question was completely theoretical. I would like to understand why so much code (even on well reputated sites) use `char` in connection with `fgetc`. There must be a reason ... ! – geohei Mar 02 '18 at 06:07
  • Lot's of code is broken — every piece of code that saves the result of `getchar()`, `getc()` or `fgetc()` into a variable of type `char` is broken. People don't care. The name `getchar()` is misleading because it returns an `int`. But there's nothing you can do except ignore sites where `char` is used to store the result of the character input functions. – Jonathan Leffler Mar 02 '18 at 08:04

2 Answers2

8

ISO C mandates that fgetc() returns an int since it must be able to return every possible character in addition to an end-of-file indicator.

So code that places the return value into a char, and uses it to detect EOF, is generally plain wrong and should not be used.


Having said that, two of the examples you gave don't actually do that.

One of them uses fseek and ftell to get the number of bytes in the file and then uses that to control the read/write loop. That's could be problematic since the file can actually change in size after the size is retrieved but that's a different problem to trying to force an int into a char.

The other uses feof immediately after the character is read to check if the end of file has been reached.


But you're correct in that the easiest way to do it is to simply use the return value correctly, something like:

int charInt;
while ((charInt = fgetc(inputHandle)) != EOF)
    doSomethingWith(charInt);
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Unbelievable that so many sites show wrong code (using `char`)! Many thanks for the explanation! Yes, you are right - 2 of my examples use different ways to control where to stop (EOF), but my initial question was more about the `char`/`int` issue I hit. – geohei Mar 02 '18 at 06:18
  • 1
    @geohei: Well, I know from my own experience that w3schools is ... err, less than ideal ... when it comes to code quality (and this is *my* opinion of course), the others I've never looked into. But, honestly, if you're getting your code snippets nowadays from anywhere but SO, you're probably doing it wrong :-) – paxdiablo Mar 02 '18 at 06:20
  • BTW: C has an automatic promotion to `int` for all calculations (and function arguments), so `char` will not bring the expected optimization. – Giacomo Catenazzi Mar 02 '18 at 13:19
4

Well the thing is most of code you saw then is wrong. There are 3 types of char - signed, unsigned and plain char. Now if plain char is by default signed then a character with decimal value 255 will be considered equal to -1 (EOF). This is not what you want. (Yes decimal value 255 won't be representable in signed char but it's implementation defined behavior and on most ones it will store the bit pattern 0xFF in the char).

Secondly if char is unsigned then it EOF will be considered as 0xFF that is also wrong now and comparison would fail. (Knowing that EOF is -1 it will be converted to CHAR_MAX which is 255 or 0xFF).

That's why int is considered so that it can hold the value of EOF correctly and that is how you should use it.

user2736738
  • 30,591
  • 5
  • 42
  • 56