45

In Python, and in general - does a close() operation on a file object imply a flush() operation?

user207421
  • 305,947
  • 44
  • 307
  • 483
Adam Matan
  • 128,757
  • 147
  • 397
  • 562

5 Answers5

39

Yes. It uses the underlying close() function which does that for you (source).

tshepang
  • 12,111
  • 21
  • 91
  • 136
Martin Wickman
  • 19,662
  • 12
  • 82
  • 106
  • 2
    (In other words: That file I/O is buffered is large abstracted and hidden away from you. Doing an `open`, `write`, `close` shouldn't leave stuff unwritten as that's what you already intended with `write`. A buffer that routinely eats what gets thrown at it would be quite a bad design [or a hungry buffer].) – Joey Mar 15 '10 at 12:54
  • 1
    Thanks, that was my guess too. But is this true cross-platform, cross-OS, and cross-languages? – Adam Matan Mar 15 '10 at 12:57
  • @Adam Matan: That's why Python sits on top of the C libraries. To assure that "this true cross-platform, cross-OS". I don't know what "cross-languages" means. – S.Lott Mar 15 '10 at 13:18
  • 3
    +1 Thanks. By "cross-language" I meant to ask whether this behavior is similar in the vast majority of modern programming languages. – Adam Matan Mar 15 '10 at 15:25
  • While this answer is strictly speaking correct, the comments here suggest that `flush` has something to do with the OS buffering. Such interpretation is incorrect, and so I think perhaps this answer might benefit from a clarification or a [reference to the Douglas Leeder's answer](https://stackoverflow.com/a/2447205/336527). – max Jun 13 '17 at 06:49
  • link "source" no longer works. Could someone update it please? – J.A.Cado Jun 24 '20 at 13:46
  • 1
    *"Yes. It uses the underlying close() function which does that for you..."* -> I think you should read `man 2 close()` again, because this is **not true**. – CodeClown42 May 17 '21 at 15:23
  • @CodeClown42 Good point. What do you recommend in the point of view of process/thread to be able to safely proceed to use the newly written file then? I have tried to use `os.fsync(f.fileno())` to force it write to the hardware buffer and then `f.close`. But subsequent code called `subprocess.run(['xz', 'my_file'])` but got return code 1. I suspected that it couldn't see the file at that point as I can see it when I did `ls -l my file`. – HCSF Jun 12 '21 at 13:05
  • In the same process? I'm surprised, but then writing data to disk then immediately reading it from the same process using a new file descriptor may be a little unusual. OTOH regardless of the hardware events, the OS should be presenting a consistent perspective on the filesystem. I guess you could try a loop with a brief pause :/ – CodeClown42 Jun 12 '21 at 14:09
  • Although that consistency doesn't mean synchronous (except in so far as they are operations with the same descriptor); they're just atomic, meaning what absolutely shouldn't happen is you write a file and that or some other process/thread reads it and only gets half the data or something. It's all (read occurs after write) or nothing (read occurs before write) but that doesn't mean ordered procedural calls using different descriptors can expect to be synchronous (happen in the procedural order of the code): https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07 – CodeClown42 Jun 12 '21 at 14:27
20

NB: close() and flush() won't ensure that the data is actually secure on the disk. It just ensures that the OS has the data == that it isn't buffered inside the process.

You can try sync or fsync to get the data written to the disk.

Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
5

Yes, in Python 3 this is finally in the official documentation, but is was already the case in Python 2 (see Martin's answer).

Felix D.
  • 786
  • 1
  • 11
  • 17
1

As a complement to this question, yes python flushes before close, however if you want to ensure data is written properly to disk this is not enough.

This is how I would write a file in a way that it's atomically updated on a UNIX/Linux server, whenever the target file exists or not. Note that some filesystem will implicitly commit data to disk on close+rename (ext3 with data=ordered (default), and ext4 initially uncovered many application flaws before adding detection of write-close-rename patterns and sync data before metadata on those[1]).

# Write destfile, using a temporary name .<name>_XXXXXXXX
base, name = os.path.split(destfile)
tmpname = os.path.join(base, '.{}_'.format(name))  # This is the tmpfile prefix
with tempfile.NamedTemporaryFile('w', prefix=tmpname, delete=False) as fd:
    # Replace prefix with actual file path/name
    tmpname = str(fd.name)

    try:
        # Write fd here... ex:
        json.dumps({}, fd)

        # We want to fdatasync before closing, so we need to flush before close anyway
        fd.flush()
        os.fdatasync(fd)

        # Since we're using tmpfile, we need to also set the proper permissions
        if os.path.exists(destfile):
            # Copy destination file's mask
            os.fchmod(fd.fileno, os.stat(destfile).st_mode)
        else:
            # Set mask based on current umask value
            umask = os.umask(0o22)
            os.umask(umask)
            os.fchmod(fd.fileno, 0o666 & ~umask)  # 0o777 for dirs and executable files

        # Now we can close and rename the file (overwriting any existing one)
        fd.close()
        os.rename(tmpname, destfile)
    except:
        # On error, try to cleanup the temporary file
        try:
            os.unlink(tmpname)
        except OSError:
            pass
        raise

IMHO it would have been nice if Python provided simple methods around this... At the same time I guess if you care about data consistency it's probably best to really understand what is going on at a low level, especially since there are many differences across various Operating Systems and Filesystems.

Also note that this does not guarantee the written data can be recovered, only that you will get a consistent copy of the data (old or new). To ensure the new data is safely written and accessible when returning, you need to use os.fsync(...) after the rename, and even then if you have unsafe caches in the write path you could still lose data. this is common on consumer-grade hardware although any system can be configured for unsafe writes which boosts performance too. At least even with unsafe caches, the method above should still guarantee whichever copy of the data you get is valid.

Thomas Guyot-Sionnest
  • 2,251
  • 22
  • 17
  • By the way I think we could just close and fdatasync right before the rename - I'm not sure if I had any reason to sync before close, maybe just to not change the mode before the data is flushed (as an indication the write was complete? but the rename does that too), and it's not much longer either as it would save just one line. – Thomas Guyot-Sionnest Mar 02 '22 at 05:04
-8

filehandle.close does not necessarily flush. Surprisingly, filehandle.flush doesn't help either---it still can get stuck in the OS buffers when Python is running. Observe this session where I wrote to a file, closed it and Ctrl-Z to the shell command prompt and examined the file:

$  cat xyz
ghi
$ fg
python

>>> x=open("xyz","a")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.flush
<built-in method flush of file object at 0x7f58e0044660>
>>> x.close
<built-in method close of file object at 0x7f58e0044660>
>>> 
[1]+  Stopped                 python
$ cat xyz
ghi

Subsequently I can reopen the file, and that necessarily syncs the file (because, in this case, I open it in the append mode). As the others have said, the sync syscall (available from the os package) should flush all buffers to disk but it has possible system-wide performance implications (it syncs all files on the system).

przemek
  • 1
  • 1
  • 23
    Hm - I suspect your problem there is that you didn't actually **call** `flush()` or `close()` - you just ended up displaying their representation! You need parens to call those methods. – Dan Fairs Jan 08 '13 at 14:52