1

I'm writing a C Program that changes all uppercase characters to lowercase from a certain file. The problem is, after storing all characters (processed) in a string, and trying to print all characters inside that file, they are appended to that file, and not overwritten. I'm using "r+" as a permission.

This is my code:

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

#define MAX 1024

int main(int argc, char **argv)
{
    if (argc != 2) {
        printf("Invalid number of arguments.\n");
        exit(1);
    }

    FILE *f = fopen(argv[1], "r+");
    if (f == NULL)
    {
        printf("Could not open specified file.\n");
        exit(2);
    }

    char buf[MAX];
    int len = 0;
    
    while(!feof(f))
    {
        int c = fgetc(f);
        if (c >= 'A' && c <= 'Z')
            c = tolower(c);
        if (c != EOF) {
            buf[len] = c;
            len++;
        }
    }

    for (int i = 0; buf[i] != '\0'; i++)
        fputc(buf[i], f);

    fclose(f);

    return 0;
}

Same problem with fputs(buf, f). Also even after appending, some "strange" characters appear at the end of the selected file.

What's the problem, exactly? How can I fix it? Thank you. Help would be appreciated.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Mateaș Mario
  • 113
  • 1
  • 7

1 Answers1

2

Your program has incorrect and even undefined behavior for multiple reasons:

  • while(!feof(f)) does not detect end of file reliably, feof() is set after an attempt at read from f fails. You should instead write:

      while ((c = getc(f)) != EOF) {
          if (isupper(c))
              c = tolower(c);
          ...
    
  • testing for uppercase with if (c >= 'A' && c <= 'Z') works for ASCII but would fail for EBCDIC. You could instead use isupper(c) or unconditionally use tolower(c).

  • reading and writing to the same file, open for read and update mode with "r+", requires a call to fseek() or rewind() to change from reading to writing and vice versa.

  • you do not check if len stays within the boundaries of buf, causing undefined behavior when attempting to write beyond the end of the array for sufficiently long input.

  • you do not set a null terminator at buf[len], hence calling fputs(buf, f) has undefined behavior because buf is not a proper C string. Similarly, the loop in the posted code iterates while buf[i] != '\0' which causes undefined behavior as this null terminator has not been set at the end of the data read from the file.

Here is a modified version:

#include <ctype.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    FILE *f;
    int c;

    if (argc != 2) {
        printf("Invalid number of arguments.\n");
        return 1;
    }

    if ((f = fopen(argv[1], "r+")) == NULL) {
        printf("Could not open file %s.\n", argv[1]);
        return 2;
    }

    while ((c = getc(f)) != EOF) {
        if (isupper(c) {
            fseek(f, -1L, SEEK_CUR);
            putc(tolower(c), f);
            fseek(f, 0L, SEEK_CUR);
        }
    }
    fclose(f);
    return 0;
}

Note that it would be more efficient and reliable to read from the file and write to a different file. You can try this simplistic filter and compare with the above code for a large file:

#include <ctype.h>
#include <stdio.h>

int main() {
    int c;
    while ((c = getchar()) != EOF) {
        putchar(tolower(c));
    }
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • given that they explicitly check for `c != EOF` before adding to `buf`, is that end-of-file processing actually wrong here? Removing that check against `EOF` would bring the stray extra `\377` in the output, though. – ilkkachu Mar 06 '21 at 11:47
  • @ilkkachu: you are correct, the reason for the spurious bytes in the output is elsewhere, in the output loop testing for a null terminator that was never set. The reading loop logic is flawed but should not cause a problem in this particular case, yet every invalid use of `feof()` should be flagged: https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong?rq=1 – chqrlie Mar 06 '21 at 14:51
  • Well. It doesn't look exactly flawed here. As far as I can see, it does test for `EOF` correctly from the return value of `fgetc()` in all cases. (`EOF` probably doesn't hit in the range test between `A` and `Z` and as far as I can find out, `tolower()` is supposed to return it unchanged anyway.) Though of course the loop could be much clearer in structure, e.g. by `break`'ing out if `c == EOF`. While it appears that using `feof()` wrong is a common mistake, I don't think we should criticise too harshly those who, for once, happen to get it actually _right_. – ilkkachu Mar 06 '21 at 14:59
  • @ilkkachu: I hope I did not get through as *harsh*... and as I wrote here above, the loop works, but the logic is flawed and required a double test to avoid incorrect output. Testing `feof(f)` before reading the file is a common mistake that indicates some misunderstanding about streams. Beginners should be taught to read and test the return value for errors: `while ((c = getc(f)) != EOF)` is the classic idiom for this case, and [`while (!feof(f))` is always wrong :)](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – chqrlie Mar 06 '21 at 15:05
  • ok, "harsh" was maybe a bit too... _harsh_ a choice of words :P Maybe I'm just happy if beginners get gotchas like that right, even if it's not as clean as it could be, since cleaning up is easier after that, vs. if it was _wrong_. Even better if they understand the _why_, which you maybe meant that linked question telling. (Except that I think the top answer there starts by going beside the point.) But perhaps we're getting rather diminished returns at this point. :) – ilkkachu Mar 06 '21 at 15:29