How to replace a lowercase character to uppercase in a text file

Question

I'm converting each lowercase character into uppercase, but it's not replacing in the file. What am I doing wrong?

#include <stdio.h>

int main() {
    FILE *file;
    char ch;
    file = fopen("file.txt", "r+");
    if (file == NULL) {
        printf("Error");
        exit(1);
    } else {
        while ((ch = fgetc(file)) != EOF) {
            if (ch >= 96 && ch <= 123) {
                ch = ch - 32;
                putc(ch, file);
            }
        }
        fclose(file);
        return 0;
    }
}

Look up `isalpha()`, `islower()`, `isupper()`, `tolower()`, `toupper()` from ``. Don't use `96` to mean back-quote (97 is `'a'`) or `123` to mean `{` (`'z'` is 122); use the literals because you won't get the numbers wrong. — Jonathan Leffler, Sep 17 '17 at 19:43
What are you doing wrong? [`fgetc` returns an int](https://stackoverflow.com/questions/35356322/difference-between-int-and-char-in-getchar-fgetc-and-putchar-fputc). Hardcoding `a` and `z` (and with invalid values. Reading and writing to the same text file with interleaved operations, not using `perror` to write the error message, not using the `islower` function, not actually using the `tolower` function... — Antti Haapala -- Слава Україні, Sep 17 '17 at 20:06
And forgetting to use `flush` after each write, thus causing undefined behaviour. — Antti Haapala -- Слава Україні, Sep 17 '17 at 20:09
Note that the `else` clause could be made 'unconditional', avoiding a level of indentation, because the body of the `if` ends with `exit()` so the code will never get beyond that. If there was a `return` there, you could also drop the `else`. Nevertheless, what you've got is OK. — Jonathan Leffler, Sep 17 '17 at 20:10
@AnttiHaapala: a positioning operation (e.g. `fseek()`) is sufficient; `fflush()` is over-heavy. — Jonathan Leffler, Sep 17 '17 at 20:10
Yeah `fseek` is sufficient, but using `fseek` without `ftell` will result in undefined behaviour on text files. — Antti Haapala -- Слава Україні, Sep 17 '17 at 20:15
that is why you should almost every time check for errors (here the return value of `putc()`) — 12431234123412341234123, Sep 17 '17 at 21:01
A couple things you have to know before you start: 1) Which character encoding is the file using? 2) For which culture/locale's rules do you want to use for the case conversion? — Tom Blodget, Sep 17 '17 at 22:43

score 3 · Answer 1 · edited Sep 17 '17 at 20:07

3

You have to open another file to write.

fileOut = fopen("fileOut.txt", "w");

ch must be integer.

int ch;

Check men page like this.

#man fgetc

And:

putc(ch,fileOut);

should be out of if block.

edited Sep 17 '17 at 20:07

Jonathan Leffler

730,956
141
904
1,278

answered Sep 17 '17 at 20:05

Younghwa Park

392
2
9

This is another way to do it, but it isn't 100% mandatory. Variations on what the question proposes can work fine. – Jonathan Leffler Sep 17 '17 at 20:08
1

this is the 100 % *sane* way to do it. – Antti Haapala -- Слава Україні Sep 17 '17 at 20:17

Stephan Lechner · Accepted Answer · 2017-09-19T22:45:01.883

The issue is that with fopen/'+', switching from reading to writing requires an intermediate call to a file positioning function, e.g. fseek:

7.21.5.3 The fopen function

(7) When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end- of-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.

So you probably have to write something like

fseek (fp , -1, SEEK_CUR);
putc(ch,file);
fseek (fp , 0, SEEK_CUR);

Note, however, that replacing characters one by one in a (possibly) large file is rather inefficient. The preferred way would be to read in one file, do the conversions and write to another file, and - in the end - replace the one file with the other one. If that cannot be achieved, try at least to read in / convert / write back larger chunks (not byte by byte).

Just to check whether converting byte by byte is really inefficient, I compared it to a chunk-by-chunk - approach. It turns out - at least at my machine - that chunk-by-chunk in this test is 500 times faster than byte-by-byte. The file size is about 100k:

int main(void) {
    FILE *file;
    int ch;
    file = fopen("/Users/stephanl/Desktop/ImplantmedPic_Storeblok.jpg", "rb+");
    if (file == NULL) {
        printf("Error: cannot open file.txt\n");
        return 1;
    } else {
        clock_t t1 = clock();

        // The following variant takes about 900 ticks:
        unsigned char buffer[1000];
        unsigned long l;
        while (1) {
            unsigned long fpi = ftell(file);
            l=fread(buffer, 1, 1000, file);
            if (l==0)
                break;

            for (int i=0;i<l;i++) {
                buffer[i]=~buffer[i];
            }

            fseek(file, fpi, SEEK_SET);
            fwrite(buffer,1,l, file);
            fseek(file, 0L, SEEK_CUR);
        }


        // this variant takes about 500.000 ticks
        while ((ch = fgetc(file)) != EOF) {
                fseek(file, -1L, SEEK_CUR);
                putc(~ch, file);
                fseek(file, 0L, SEEK_CUR);
        }

        fclose(file);
        clock_t t2 = clock();
        printf ("difference: %ld \n", t2-t1);
        return 0;
    }
}

At least it is standard I/O using buffering, so the `putc()` calls are not doing a physical `write()` system call. But you're diagnosis is spot on. I'm moderately sure this is a duplicate, but I'm not sure I can be bothered to find the duplicate. — Jonathan Leffler, Sep 17 '17 at 19:39
@Jonathan Leffler: I think - though not having tested it - that switching from input to output and back introduces a lot of overhead (probably flushing the buffers). And there is a similar answer (here)[https://stackoverflow.com/a/46152654/2630032], but the char-by-char - issue made me not marking the question as duplicate. — Stephan Lechner, Sep 17 '17 at 19:43
In a single-threaded process, there isn't (or doesn't need to be) all that much overhead. I'm not sure how much more severe the problem is in multi-threaded processes. And modern libraries may have to make the pessimistic assumption that they're working in a multi-threaded process. It was a whole heap easier 30 years ago. — Jonathan Leffler, Sep 17 '17 at 19:45
this isn't standards-compliant either. `fseek` to a position that was not returned by `ftell` will result in undefined behaviour on text files. In practice it will probably fail spectacularly on windows where `\n` results in 2 characters. — Antti Haapala -- Слава Україні, Sep 17 '17 at 20:16
@StephanLechner: the file should be open in binary mode and `ch` should be defined with `int` type. — chqrlie, Sep 17 '17 at 20:22
@Jonathan Leffler: Just wanted to know and tried it out: byte-by-byte is much slower than chunk-by-chunk on my machine (see edited answer). — Stephan Lechner, Sep 19 '17 at 22:47
500 times is a big factor — bigger than I would have expected. I don't know when that changed — or even if it changed (I no longer have access to the systems of yester-millennium that might have been relevant). I would certainly expect byte-by-byte to be slower, but not by that margin. Thanks for testing and reporting. (I can't give extra up-votes, I'm afraid.) — Jonathan Leffler, Sep 19 '17 at 22:50

score 1 · Answer 3 · answered Sep 17 '17 at 20:20

There are multiple problems in your code:

opening the file in read+update mode ("r+") is tricky: you must call fseek() between read and write operations, otherwise the behavior is undefined. Furthermore, you must open the file in binary mode for fseek() to operate correctly and portably on byte offsets.
ch must be defined as int for EOF to be properly detected.
hardcoding the values of 'a' and 'z' as 96 and 123 is non portable and error prone, in fact 'a' is 97 in ASCII, not 96 and 'z' is 122, not 123. Use the functions from <ctype.h> for best portability and readability.
shifting by 32 only works for ASCII, use toupper() instead.
you forgot to include <stdlib.h> that declares the exit() function.

Here is a corrected (simplistic and inefficient) version:

#include <stdio.h>

int main(void) {
    FILE *file;
    int ch;
    file = fopen("file.txt", "rb+");
    if (file == NULL) {
        printf("Error: cannot open file.txt\n");
        return 1;
    } else {
        while ((ch = fgetc(file)) != EOF) {
            if (islower(ch)) {
                fseek(file, -1L, SEEK_CUR);
                putc(toupper(ch), file);
                fseek(file, 0L, SEEK_CUR);
            }
        }
        fclose(file);
        return 0;
    }
}

score 0 · Answer 4 · answered Sep 18 '17 at 13:15

Use toupper(3) function (or macro call) from #include <ctype.h>, so you don't have to know which character set is available at your installation.

#include <stdio.h>

int main() {
    FILE *file;
    int ch;
    file = fopen("file.txt", "r+");
    if (file == NULL) {
        printf("Error");
        exit(1);
    } else {
        while ((ch = fgetc(file)) != EOF) {
            putc(toupper(ch), file);
        }
        fclose(file);
        return 0;
    }
}

NOTE

There's a flaw in your original code... you check ch to be lowercase and only output it if it happens to be. You have to output all uppercase and no-case characters also.... so don't put the putc(3) call in the if block. My code just output all the characters converted to uppercase with the function. Also, you have to define ch as int, as required by fgetc(3), toupper(3) and putc(3).

How to replace a lowercase character to uppercase in a text file

4 Answers4

NOTE