8
int main()
{
    FILE *ft;
    char ch;
    ft=fopen("abc.txt","r+");
    if(ft==NULL)
    {
        printf("can not open target file\n");
        exit(1);
    }
    while(1)
    {
        ch=fgetc(ft);
        if(ch==EOF)
        {
            printf("done");
            break;
        }
        if(ch=='i')
        {
            fputc('a',ft);
        }
    }
    fclose(ft);
    return 0;
}

As one can see that I want to edit abc.txt in such a way that i is replaced by a in it.
The program works fine but when I open abc.txt externally, it seemed to be unedited.
Any possible reason for that?

Why in this case the character after i is not replace by a, as the answers suggest?

zee
  • 188
  • 2
  • 2
  • 9
  • Try `fflush()`-ing the descriptor maybe... – mike.dld Feb 22 '14 at 18:05
  • fclose(ft) before you return. – cup Feb 22 '14 at 18:05
  • 1
    `fgetc()` returns an `int`, not a `char`; it has to return every valid `char` value plus a separate value, EOF. As written, you can't reliably detect EOF. If `char` is an unsigned type, you'll never find EOF; if `char` is a signed type, you'll misidentify some valid character (often ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) as EOF. – Jonathan Leffler Feb 22 '14 at 18:06

3 Answers3

23

Analysis

There are multiple problems:

  1. fgetc() returns an int, not a char; it has to return every valid char value plus a separate value, EOF. As written, you can't reliably detect EOF. If char is an unsigned type, you'll never find EOF; if char is a signed type, you'll misidentify some valid character (often ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) as EOF.

  2. If you switch between input and output on a file opened for update mode, you must use a file positioning operation (fseek(), rewind(), nominally fsetpos()) between reading and writing; and you must use a positioning operation or fflush() between writing and reading.

  3. It is a good idea to close what you open (now fixed in the code).

  4. If your writes worked, you'd overwrite the character after the i with a.

Synthesis

These changes lead to:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    FILE *ft;
    char const *name = "abc.txt";
    int ch;
    ft = fopen(name, "r+");
    if (ft == NULL)
    {
        fprintf(stderr, "cannot open target file %s\n", name);
        exit(1);
    }
    while ((ch = fgetc(ft)) != EOF)
    {
        if (ch == 'i')
        {
            fseek(ft, -1, SEEK_CUR);
            fputc('a',ft);
            fseek(ft, 0, SEEK_CUR);
        }
    }
    fclose(ft);
    return 0;
}

There is room for more error checking.

Exegesis

Input followed by output requires seeks

The fseek(ft, 0, SEEK_CUR); statement is required by the C standard.

ISO/IEC 9899:2011 §7.21.5.3 The fopen function

¶7 When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of- file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.

(Emphasis added.)

fgetc() returns an int

Quotes from ISO/IEC 9899:2011, the current C standard.

§7.21 Input/output <stdio.h>

§7.21.1 Introduction

EOF which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;

§7.21.7.1 The fgetc function

int fgetc(FILE *stream);

¶2 If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

Returns

¶3 If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.289)

289) An end-of-file and a read error can be distinguished by use of the feof and ferror functions.

So, EOF is a negative integer (conventionally it is -1, but the standard does not require that). The fgetc() function either returns EOF or the value of the character as an unsigned char (in the range 0..UCHAR_MAX, usually 0..255).

§6.2.5 Types

¶3 An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

¶5 An object declared as type signed char occupies the same amount of storage as a ‘‘plain’’ char object.

§6 For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.

§15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.45)

45) CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

This justifies my assertion that plain char can be a signed or an unsigned type.

Now consider:

char c = fgetc(fp);
if (c == EOF)
   …

Suppose fgetc() returns EOF, and plain char is an unsigned (8-bit) type, and EOF is -1. The assignment puts the value 0xFF into c, which is a positive integer. When the comparison is made, c is promoted to an int (and hence to the value 255), and 255 is not negative, so the comparison fails.

Conversely, suppose that plain char is a signed (8-bit) type and the character set is ISO 8859-15. If fgetc() returns ÿ, the value assigned will be the bit pattern 0b11111111, which is the same as -1, so in the comparison, c will be converted to -1 and the comparison c == EOF will return true even though a valid character was read.

You can tweak the details, but the basic argument remains valid while sizeof(char) < sizeof(int). There are DSP chips where that doesn't apply; you have to rethink the rules. Even so, the basic point remains; fgetc() returns an int, not a char.

If your data is truly ASCII (7-bit data), then all characters are in the range 0..127 and you won't run into the misinterpretation of ÿ problem. However, if your char type is unsigned, you still have the 'cannot detect EOF' problem, so your program will run for a long time. If you need to consider portability, you will take this into account. These are the professional grade issues that you need to handle as a C programmer. You can kludge your way to programs that work on your system for your data relatively easily and without taking all these nuances into account. But your program won't work on other people's systems.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    `fseek(ft, 0, SEEK_CUR);` This line is doing nothing and isn't needed. – OregonTrail Feb 22 '14 at 18:20
  • @zee: I don't know how to explain it better. Plain `char` can be a signed type or an unsigned type. Whichever it is, using `char c = fgetc(fp);` is wrong — you must read the result into an `int`. (There might be exceptions for oddball systems where `sizeof(char) == sizeof(int)`, but not otherwise.) – Jonathan Leffler Feb 22 '14 at 18:21
  • 6
    @OregonTrail: _au contraire_. The C standard requires a positioning operation between a read and a write operation on an update stream, or between a write and a read. This is a positioning operation between a write and a read. It is **not** a no-op; it places the stream into a mode which allows the next `fgetc()` to work correctly, reliably, across platforms, as required by the C standard. – Jonathan Leffler Feb 22 '14 at 18:23
  • @JonathanLeffler : but I think that even if I declare it as `char` then also it stores its ascii code. so what makes difference. – zee Feb 22 '14 at 18:24
  • @JonathanLeffler K&R specifies only three modes for opening a file - `"r"`, `"w"`, `"a"` with `"b"` appended sometimes to distinguish between text and binary files (page 160). What is `"r+"` mode and was it added later to the `C` standard? Are there `"w+"` and `"a+"` modes also? – ajay Feb 22 '14 at 18:25
  • @ajay : yes there are modes `"w+"` and `"a+"`. `"w+"` :- create a text file for read/write. `"a+"` :- append or create a text file for read/write – zee Feb 22 '14 at 18:27
  • @JonathanLeffler Interesting, I'll start doing that if you can point me to the relevant documentation. I began searching but I can't find it. – OregonTrail Feb 22 '14 at 18:27
  • @zee `EOF` is a value such that for any character ch, `EOF == ch` is always false. Therefore to store the value of `EOF` you should use an `int` type variable, not a `char` type because depending on whether `char` type is signed or unsigned, the value of `EOF` maybe interpreted as that of a character. – ajay Feb 22 '14 at 18:29
  • 2
    @OregonTrail Check [this](http://linux.die.net/man/3/fopen) out, it says 'Note that ANSI C requires that a file positioning function intervene between output and input, unless an input operation encounters end-of-file.' – Lee Duhem Feb 22 '14 at 18:30
  • 1
    @ajay: 7th Edition Unix only had `"r"`, `"w"`, and `"a"` modes in 1979. However, the first edition of the C standard (1989) had the extended modes (the `b` modifier, and the `+` modes), and I think the `+` modes were available even earlier. – Jonathan Leffler Feb 22 '14 at 18:30
  • @ajay : you mean to say that `EOF` could be a value out of range of `char` – zee Feb 22 '14 at 18:33
  • 2
    @zee `EOF` is not a character! Therefore, it has to be *out of range* of `char`. It's a value to signal that no more characters can be read from a stream. – ajay Feb 22 '14 at 18:34
  • @ajay : so `EOF` could also be out of range of int. Then what? – zee Feb 22 '14 at 18:36
  • @zee The value of `EOF` is an integer (commonly -1). The only requirement is it just has to be unequal to any valid character code. It cannot be out of range of `int` type. – ajay Feb 22 '14 at 18:39
  • @JonathanLeffler : `This is a positioning operation between a write and a read. It is not a no-op; it places the stream into a mode which allows the next fgetc() to work correctly, reliably, across platforms, as required by the C standard`. How `fseek(ft, 0, SEEK_CUR);` allow fgetc() work correctly? – zee Feb 22 '14 at 18:41
  • @ajay : but -1 comes under the range of `char` – zee Feb 22 '14 at 18:42
  • @zee That depends on whether `char` is signed or unsigned type. If `char` is signed type, then EOF will be some int value different from all character codes on the machine. I said `commonly` it's `-1`. It's an implementation detail and you don't have to bother about its actual value. Just have a type different from `char` and big enough to store it. That's all. – ajay Feb 22 '14 at 18:46
  • @JonathanLeffler : in my case, the character after `i`, is also not replaced by `a`. why? – zee Feb 22 '14 at 18:50
  • 1
    @zee: In your original code, you were not doing the `fseek()` operations. If you checked the return value from `fputc()`, you'd probably get an error indication. See also my (extensive) update and the 'Exegesis' section. The standard doesn't say how `fseek(ft, 0, SEEK_CUR)` allows `fgetc()` to work correctly; it just says that it must do so. The how is a problem for the implementation. – Jonathan Leffler Feb 22 '14 at 19:01
1

You are not changing the 'i' in abc.txt, you are changing the next character after 'i'. Try to put fseek(ft, -1, SEEK_CUR); before your fputc('a', ft);.

After you read a 'i' character, the file position indicator of ft will be the character after this 'i', and when you write a character by fputc(), this character will be write at the current file position, i.e. the character after 'i'. See fseek(3) for further details.

Lee Duhem
  • 14,695
  • 3
  • 29
  • 47
  • if I put `fseek(ft, -1, SEEK_CUR); `, the loop turns to be infinite. – zee Feb 22 '14 at 18:12
  • @zee No, it will not. – Lee Duhem Feb 22 '14 at 18:15
  • oops sorry...that was a other mistake – zee Feb 22 '14 at 18:26
  • in my case, the character after `i`, is also not replaced by `a`. why? – zee Feb 22 '14 at 18:49
  • @zee I tested the code included in your question in a Linux system, it indeed changed all the characters after these 'i' characters in `abc.txt`. How did you test your code? And in which OS? – Lee Duhem Feb 22 '14 at 18:56
  • it worked for me only when I add `fseek(ft,0,SEEK_CUR);` before and after `fputc(ft)` – zee Feb 22 '14 at 19:02
  • Put the `fseek(ft, 0, SEEK_CUR); fflush(ft);` after your `fputc('a', ft);` and try again. – Lee Duhem Feb 22 '14 at 19:03
  • why I have to do that. And what `fflush(ft)` do in this case – zee Feb 22 '14 at 19:04
  • @zz Well, I guess this is one of the differences between Windows and Linux, and according to ANSI C, it is what you should do. "For output streams, `fflush()` forces a write of all user-space buffered data for the given output or update stream via the stream's underlying write function." See [`fflush(3)`](http://linux.die.net/man/3/fflush) – Lee Duhem Feb 22 '14 at 19:08
  • and why I need to do fseek(ft,0,SEEK_CUR); – zee Feb 22 '14 at 19:10
  • 1
    @zee: because the standard says you need it, and because it doesn't work when you don't do it. How many more reasons do you need? – Jonathan Leffler Feb 22 '14 at 19:12
  • @JonathanLeffler : you got me wrong. The standards are also made according to specific reasons. I want to know why this standard is made? If you know. – zee Feb 22 '14 at 19:14
  • 1
    @zee: In general, the more peculiar provisions in the C standard are there because some system or other has difficulty handling things if the provision isn't made. For an extreme example, see the restrictions on how you can use the `setjmp()` macro from ``. More nearly topical, there are restrictions on what happens with text files (trailing blanks, final newline) that make it possible for systems to comply with the standard that otherwise could not. In this case, I'm not sure of all the ins and outs, but it makes the implementation easier. Remember there's `ungetc()` to handle too. – Jonathan Leffler Feb 22 '14 at 19:25
0

After reading 'i' you need to "step back" to write to the correct location.

if(ch=='i')
{
    fseek(ft, -1, SEEK_CUR);
    fputc('a',ft);
}
OregonTrail
  • 8,594
  • 7
  • 43
  • 58
  • 1
    You also need a second `fseek()` operation after the `fputc()` according to the C standard — see my answer for the relevant quotes from the standard. – Jonathan Leffler Feb 22 '14 at 19:05