6

It appears that a write() immediately following a read() on a file opened with r+ (or r+b) permissions in Windows doesn't update the file.

Assume there is a file testfile.txt in the current directory with the following contents:

This is a test file.

I execute the following code:

with open("testfile.txt", "r+b") as fd:
    print fd.read(4)
    fd.write("----")

I would expect the code to print This and update the file contents to this:

This----a test file.

This works fine on at least Linux. However, when I run it on Windows then the message is displayed correctly, but the file isn't altered - it's like the write() is being ignored. If I call tell() on the filehandle it shows that the position has been updated (it's 4 before the write() and 8 afterwards), but no change to the file.

However, if I put an explicit fd.seek(4) just before the write() line then everything works as I'd expect.

Does anybody know the reason for this behaviour under Windows?

For reference I'm using Python 2.7.3 on Windows 7 with an NTFS partition.

EDIT

In response to comments, I tried both r+b and rb+ - the official Python docs seem to imply the former is canonical.

I put calls to fd.flush() in various places, and placing one between the read() and the write() like this:

with open("testfile.txt", "r+b") as fd:
    print fd.read(4)
    fd.flush()
    fd.write("----")

... yields the following interesting error:

IOError: [Errno 0] Error

EDIT 2

Indirectly that addition of a flush() helped because it lead me to this post describing a similar problem. If one of the commenters on it is correct, it's a bug in the underlying Windows C library.

Cartroo
  • 4,233
  • 20
  • 22
  • +1: This is a wonderful question and derserves upvoting. The problem is not quite obvious unless you have knowledge in C file IO. – Abhijit Jan 11 '13 at 14:19
  • Sorry to bump into this question here. I asked a similar question and I am still not clear if there's an answer to all this. Does it means that writing to the same file on Windows platform doesn't work as well as the documentation says and needs the "fileopen" object to be declared as many time as a read/write action is performed? – Cryssie Jan 11 '13 at 15:17
  • No, you can combine `read()` and `write()` on the same file as much as you like. But all it means is that every time you change from calling `read()` to calling `write()`, you should insert a call `fd.seek(0, os.SEEK_CUR)` as Abhijit mentions in his answer. This has the effect of saying "move the pointer to the same place it already is", but doing this makes it work (I won't say "fixes the problem" because that's apparently a subjective issue). If you do that, you can mix `read()` and `write()` freely you do *not* need to open the file a second time. This behaviour only applies on Windows. – Cartroo Jan 11 '13 at 16:08

4 Answers4

6

Python's file operation should follow the libc convention as internally its implemented using C file IO functions.

Quoting from fopen man page or fopen page in cplusplus

For files open for appending (those which include a "+" sign), on which both input and output operations are allowed, the stream should be flushed (fflush) or repositioned (fseek, fsetpos, rewind) between either a writing operation followed by a reading operation or a reading operation which did not reach the end-of-file followed by a writing operation.

SO to summarize, if you need to read a file after writing, you need to fflush the buffer and a write operation after read should be preceded by a fseek, as fd.seek(0, os.SEEK_CUR)

So just change your code snippet to

with open("test1.txt", "r+b") as fd:
    print fd.read(4)
    fd.seek(0, os.SEEK_CUR)
    fd.write("----")

The behavior is consistent with how a similar C program would behave

#include <cstdio>
int main()
{   
    char  buffer[5] = {0};
    FILE *fp = fopen("D:\\Temp\\test1.txt","rb+");
    fread(buffer, sizeof(char), 4, fp);
    printf("%s\n", buffer);
    /*without fseek, file would not be updated*/
    fseek(fp, 0, SEEK_CUR); 
    fwrite("----",sizeof(char), 4, fp);
    fclose(fp);
    return 0;
}
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • Yup, that seems to be about the long and the short of it. I think the Windows implementation is pretty poor to require that, but it won't be my first or last gripe about the Windows APIs I'm sure. (^_^) – Cartroo Jan 11 '13 at 14:22
  • Is this answer still true for Python 3? – Boris Verkhovskiy Mar 03 '20 at 02:09
  • @Boris - apparently, I just tried it and you need to `seek` after a `read` in order to write at a specific position. – wwii Sep 18 '20 at 15:42
2

It appears that this due to the behaviour of the underlying Windows libraries (which personally I regard to be in error) and nothing wrong with Python. On adding a flush() call between reading and writing (which is apparently good practice) I got an IOError with a zero errno, which is the same issue as discussed in this blog post.

From that post I found this Python issue which mentions the problem and says that the seek() call is actually the best workaround, along with a flush() every time you change from reading to writing.

All that taken into account, it seems the best way to write the code above such that it successfully runs on Windows is:

with open("testfile.txt", "r+b") as fd:
    print fd.read(4)
    fd.flush()
    fd.seek(4)
    fd.write("----")

Might be something to bear in mind for anybody attempting to write portable code.

Cartroo
  • 4,233
  • 20
  • 22
  • This is not a bug. See my answer. – Abhijit Jan 11 '13 at 14:20
  • Personally I think that's a matter of opinion - *known* behaviour isn't the same as *correct* behaviour. I regard that a `seek()` is required as a workaround, frankly, not intended behaviour. I can see how mixing stdio with raw file IO could require a flush, but going through the same API should always be consistent - requiring the programmer to jump through hoops when it could easily be handled in the API is, in my opinion, imperfect behaviour - or, put more concisely, a bug. – Cartroo Jan 11 '13 at 14:24
  • `but going through the same API should always be consistent` ... not always, compilers tend to add features beyond whats in the standard. Also even if we accept this to be an issue, its nothing to do with Python. A similar C implementation using `msvcrt` behaves the same way. – Abhijit Jan 11 '13 at 14:53
  • I absolutely agree it's nothing to do with Python - I tried to make that clear in my comment *"... a bug in the Windows libraries as opposed to Python"*. I do think it's a misfeature in Windows, though - whatever the `seek()` does could easily be done automatically. I'm not saying I expect Microsoft to fix anything, but I think it's at least *debatably* a bug. But just to reiterate, I certainly didn't mean to imply it was a Python misfeature in any way, it's definitely the Windows libraries, not Python. I've clarified my answer to hopefully make that clear. – Cartroo Jan 11 '13 at 15:02
  • I think it is reasonable to expect there will be some differences between different compilers, especially on different platforms. So the fact that behavior is different on Windows is not necessarily what I would call a bug. What I *would* call a bug is that `tell` reported a new position, even though `write` didn't change the bytes! To me, if the `write` is going to fail, it should fail "all the way". – John Y Jan 11 '13 at 15:21
  • When you say different "compilers", presumably you mean different platforms? This is the behaviour of the underlying stdio library if I've understood correctly, which shouldn't depend on your compiler. In any case I agree - if `tell()` hadn't have shown the pointer as having moved, it wouldn't be so bad. Or if I'd have got an error, that would be fine too. But silently doing what you don't expect is never good implementation in my opinion. Of course, one has to be pragmatic about these things, but there's nothing wrong with pointing out potentially unhelpful behaviour for other people. – Cartroo Jan 11 '13 at 16:05
1

have you tried flushing ?

fd.flush()

it is OS-dependant, as write uses the filesystem caching mechanism

Pixou
  • 1,719
  • 13
  • 23
  • 1
    Nice idea - doesn't work, but it does fail in an interesting fashion. If I add `fd.flush()` *after* the `write()`, no change. However, if I add `fd.flush()` between the `read()` and the `write()` then I get `IOError: [Errno 0] Error`. That appears to be saying "Error: success", which is slightly amusing. – Cartroo Jan 11 '13 at 14:08
-1

Is it possible that the implementation missinterpretest "r+b"? Afaik "rb+" is for reading and writing in binary.

  • 1
    I thought about that but the [official Python docs](http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files) specifically mentioned `r+b` so I assumed that was canonical. I did try both, which I should have mentioned. – Cartroo Jan 11 '13 at 14:04