1

a new empty file:

touch /file.txt

read. print.

fp = fopen("/file.txt", "r");
char text[1000];
int i=0;

while(!feof(fp)){
text[i++] = getc(fp);
}

text[i]='\0';

printf("%s\n", text);

result:

ÿ

EXTRA INFO : if file.txt had many lines.. it would have appended that strange character at the very bottom of it. so perhaps it is not something that happens on every "while loop".

2 Answers2

6

If you're using ISO 8859-15 or 8859-1 code set, the ÿ (LATIN SMALL LETTER Y WITH DIAERESIS, U+00FF in Unicode) has code 25510 or 0xFF. When you store EOF in the array, it gets converted to ÿ.

Don't store EOF in a char. And remember that getchar() returns an int, not a char. It has to be able to return every value that can be stored in an unsigned char, plus EOF which is negative (usually but not necessarily -1).

And, as noted in the comments, while (!feof(file)) is always wrong. This is just another reason why.

This code is fixed, more or less. It really should report an error if it fails to open the file. Note that it also ensures you don't overflow the buffer.

FILE *fp = fopen("/file.txt", "r");
if (fp != 0)
{
    char text[1000];
    int i=0;
    int c;
    while ((c = getc(fp)) != EOF && i < sizeof(text)-1)
        text[i++] = c;

    text[i]='\0';

    printf("%s\n", text);
    fclose(fp);
}

See also while ((c = getc(file)) != EOF) loop won't stop executing.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Isn't it possible that the value of `EOF` is stored in the array unchanged (if OP has `signed char`s); but then the value is converted to `unsigned` in the `printf()`? The `%c` specifier converts to `unsigned char`, and I _think_ that `%s` does the same. – ad absurdum Jun 22 '17 at 06:05
  • @DavidBowling: I guess it depends on your definition of 'unchanged'. Yes, No, Maybe. If you have a signed plain `char` type, then `-1` is a valid character code, for a valid character, assuming an appropriate (non-UTF8) character set such as 8859-1. However, there's room to argue that EOF is an integer value, not a `char` value, so when it is stored in a (signed) `char`, there are bits truncated. In the end, it doesn't really matter — you shouldn't save the value returned by `getchar()` into a `char` until you know that it is not EOF. Beware misidentifying ÿ as EOF if it appears in text. – Jonathan Leffler Jun 22 '17 at 14:19
  • I only meant to suggest that it seems _possible_ that conversion to `unsigned char` may happen in the call to `printf()`, rather than in the assignment to the array. Of course, I wouldn't dare suggest that it is correct to store the value of `EOF` in an array of `char`s , `signed` or otherwise ;) – ad absurdum Jun 22 '17 at 14:24
  • BTW, I am not sure that characters are converted to `unsigned char` with `%s` in calls to `printf()`. I looked in the Standard and didn't see anything explicitly saying this; I'll look some more when I get a chance.... – ad absurdum Jun 22 '17 at 14:26
  • 1
    It gets a bit tricky. For `%c`, the standard says 'The `int` argument shall be converted to an `unsigned char`, and the resulting byte shall be written.' For `%s`, it says 'The argument shall be a pointer to an array of `char`. Bytes from the array shall be written up to (but not including) any terminating null byte.' The difference is that the default promotion rules apply to a single `char` passed to `printf()` in the `...` portion of the argument list, but no change occurs to the character data when passed via a pointer. I suspect we're mostly in vehement agreement in practice. – Jonathan Leffler Jun 22 '17 at 14:28
  • Incidentally, I was quoting from POSIX [`printf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html). ISO/IEC 9899:2011 uses slightly different wording — but is functionally equivalent: _If no `l` length modifier is present, the `int` argument is converted to an `unsigned char`, and the resulting character is written._ and _If no `l` length modifier is present, the argument shall be a pointer to the initial element of an array of character type. Characters from the array are written up to (but not including) the terminating null character._ (Footnote dropped.) – Jonathan Leffler Jun 22 '17 at 14:36
  • 'Tis interesting that _character type_ covers all three of `char`, `signed char` and `unsigned char` in the C standard. That could well be significant. – Jonathan Leffler Jun 22 '17 at 14:39
  • That is what I thought, when I read the passage you quote from the C Standard. Specifically, I was wondering if one could consider that a string is printed by `fprintf()` as if the characters were written by `fputc()`, which would entail conversion to `unsigned char()`. But, I see no real evidence for this conclusion. This is probably angels on the head of a pin.... – ad absurdum Jun 22 '17 at 14:48
  • 1
    FYI: §7.21.3 ¶12 says 'The byte output functions write characters to the stream as if by successive calls to the `fputc` function.' In this context, `fprintf()` is a byte output function — see §7.21.1 ¶5 (last bullet point). – Jonathan Leffler Jun 22 '17 at 15:13
5

The ÿ is the byte 255 in your codepage, which is the constant EOF coerced into a char. Instead of using feof, you must store the return value of getc into an int, then compare it against EOF, here's an easy-to-read example (notice that you'd have to have bounds-checking too):

while (1) {
    int c = getc(fp);
    if (c == EOF) {
        break;
    }
    text[i++] = c;
}
  • 1
    For illustration, I recommend the following link: [Wikipedia: Extended ASCII](https://en.wikipedia.org/wiki/Extended_ASCII) where the `ÿ` can be seen in the included picture [Output of the program ascii in Cygwin](https://en.wikipedia.org/wiki/File:Table_ascii_extended.png). – Scheff's Cat Jun 22 '17 at 05:51