1

Can anyone shed some light on why this example works:

import numpy as np


def write_data(fn, var):
    with open(fn, 'wb') as fout:
        header = 'TEST\n'
        np.savetxt(fout, var, fmt='%.2f', header=header, delimiter=' ', comments='')


data = np.asarray([[1.0, 2.0], [3.0, 4.0]])
out_file = 'out/test.txt'
write_data(out_file, data)

But it stops working if you change write_data to:

def write_data(fn, var):
    fout = open(fn, 'wb')
    header = 'TEST\n'
    np.savetxt(fout, var, fmt='%.2f', header=header, delimiter=' ', comments='')

I would not want to write the code as shown below, but someone came to me asking why this code does not work and I simply have no answer for them. In the top case, a file is written with the expected header and data, but in the bottom case, a file is created, but it is empty. No errors are reported, no exceptions are thrown.

Oddly, in the original case (which was much longer), printing the var would also cause the non-with example to start working, which makes me think this may be a timing issue, as printing the var in the example as shown makes no difference on my machine.

It's been pointed out that both these examples also fix the issue:

def write_data(fn, var):
    fout = open(fn, 'wb')
    header = 'TEST\n'
    np.savetxt(fout, var, fmt='%.2f', header=header, delimiter=' ', comments='')
    fout.close()

def write_data(fn, var):
    fout = open(fn, 'wb', buffering=0)
    header = 'TEST\n'
    np.savetxt(fout, var, fmt='%.2f', header=header, delimiter=' ', comments='')

But that serves to narrow down the question to: why does Python not flush file handles that are dereferenced and what causes a file buffer to get flushed, since apparently performing some other operation can cause this to happen automatically?

For example, the example below caused the problem to get 'solved', or rather circumvented in the original problem as it was brought to me (with a lot of added code not relevant to the problem).

def write_data(fn, var):
    fout = open(fn, 'wb')
    header = 'TEST\n'
    print(var)
    np.savetxt(fout, var, fmt='%.2f', header=header, delimiter=' ', comments='')
Grismar
  • 27,561
  • 4
  • 31
  • 54
  • 3
    Python buffers its output - which means if your program is still running, the output may not be written until you either close or flush the file. (Python tries to gracefully close files automatically when it exits, but if you're terminating your program abruptly, e.g. via Ctrl-C or similar, it may not be able to do so.) – Amber Jan 22 '19 at 01:29
  • The script ending would then have to count as terminating it abruptly - also, the result is the same when looping over 50+ files. All of them will be created, none of them receive content, even though the script runs for about 20 seconds. – Grismar Jan 22 '19 at 01:32
  • Buffering is not time-based, but rather based upon how much content has been written. – Amber Jan 22 '19 at 02:28
  • @Amber, do you have some clue on why adding a statement like `print(var)` before the `.savetxt()` statement would cause the content to get written to the file as well as the screen? Does writing to standard out case Python to flush file buffers as well, or something of the sort? The amount of data written should not change as a result, it seems? – Grismar Jan 22 '19 at 02:38
  • No idea. Could be some quirk of that particular Python implementation. – Amber Jan 22 '19 at 02:40
  • For anyone with similar issues: I was using Python 3.7.0 64-bit with the current build of numpy on PyPI on 2019-01-22. – Grismar Jan 22 '19 at 02:48

1 Answers1

2

I guess you should close the open file:

def write_data(fn, var):
    fout = open(fn, 'wb')
    header = 'TEST\n'
    np.savetxt(fout, var, fmt='%.2f', header=header, delimiter=' ', comments='')
    fout.close()
DachuanZhao
  • 1,181
  • 3
  • 15
  • 34
  • 1
    It appears you're right; that is, adding a `.close()` has the same effect as wrapping in `with`, which makes sense since Python will close the file in the cleanup of the `with`. But why would Python not flush buffers of files that have been created and written to when they are no longer referenced, or when the script ends? – Grismar Jan 22 '19 at 01:33
  • 2
    It appears that providing a `buffering=0` parameter to the `open()` statement also fixes the problem, so apparently it is the underlying system buffering that is to blame. Which narrows my question down to why Python would not cause dereferenced file pointers to flush their buffers? A bit more here https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file – Grismar Jan 22 '19 at 01:36
  • 2
    I'm accepting your answer, since it boils down to closing the file to ensure the buffer is flushed. Strictly speaking, it didn't answer the why-question, but it did connect the dots. – Grismar Jan 22 '19 at 02:51