0

As the thread How do you append to a file?, most answer is about open a file and append to it, for instance:

def FileSave(content):
    with open(filename, "a") as myfile:
        myfile.write(content)

FileSave("test1 \n")
FileSave("test2 \n")

Why don't we just extract myfile out and only write to it when FileSave is invoked.

global myfile
myfile = open(filename)
def FileSave(content):
    myfile.write(content)

FileSave("test1 \n")
FileSave("test2 \n")

Is the latter code better cause it's open the file only once and write it multiple times? Or, there is no difference cause what's inside python will guarantee the file is opened only once albeit the open method is invoked multiple times.

Eugene
  • 10,627
  • 5
  • 49
  • 67
  • There are a number of problems with your modified code that aren't really relevant to your question: you open the file in read-only mode, you never close the file, you have a `global` statement that does nothing… – abarnert Sep 10 '18 at 03:35

4 Answers4

5

There are a number of problems with your modified code that aren't really relevant to your question: you open the file in read-only mode, you never close the file, you have a global statement that does nothing…

Let's ignore all of those and just talk about the advantages and disadvantages of opening and closing a file over and over:

  • Wastes a bit of time. If you're really unlucky, the file could even just barely keep falling out of the disk cache and waste even more time.
  • Ensures that you're always appending to the end of the file, even if some other program is also appending to the same file. (This is pretty important for, e.g., syslog-type logs.)1
  • Ensures that you've flushed your writes to disk at some point, which reduces the chance of lost data if your program crashes or gets killed.
  • Ensures that you've flushed your writes to disk as soon as you write them. If you try to open and read the file elsewhere in the same program, or in a different program, or if the end user just opens it in Notepad, you won't be missing the last 1.73KB worth of lines because they're still in a buffer somewhere and won't be written until later.2

So, it's a tradeoff. Often, you want one of those guarantees, and the performance cost isn't a big deal. Sometimes, it is a big deal and the guarantees don't matter. Sometimes, you really need both, so you have to write something complicated where you manually buffer up bits and write-and-flush them all at once.


1. As the Python docs for open make clear, this will happen anyway on some Unix systems. But not on other Unix systems, and not on Windows..

2. Also, if you have multiple writers, they're all appending a line at a time, rather than appending whenever they happen to flush, which is again pretty important for logfiles.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • It's not really clear which version you assign which (dis)advantage. – viraptor Sep 10 '18 at 03:54
  • @viraptor These are all advantages and disadvantages of opening and closing the file repeatedly. – abarnert Sep 10 '18 at 03:56
  • 1
    I'm not sure the second point is really true. Multiple processes can have the same file opened in the append mode and all of them will append to the end, whatever order they end up writing in. – viraptor Sep 10 '18 at 03:58
  • @viraptor And I really don't know how to make that clearer than literally saying "the advantages and disadvantages of opening and closing a file over and over:". What would make that more obvious to you? – abarnert Sep 10 '18 at 03:58
  • @viraptor No. `'a'` files always write to the end on some Unix systems, but not others, and POSIX explicitly says that's implementation-dependent and should not be relied on. And it's not true on Windows. – abarnert Sep 10 '18 at 03:59
  • @viraptor Note that the Python docs for [`open`](https://docs.python.org/3/library/functions.html#open) explicitly say that: "`'a'` for appending (which on *some* Unix systems, means that *all* writes append to the end of the file regardless of the current seek position)". – abarnert Sep 10 '18 at 04:01
  • You always append to the end, when you flush. But that's not the same end as when you wrote the data. You probably want to merge points 2 and 3 to make that more understandable. Basically if you don't .close() you almost always have to .flush(). – James Antill Sep 10 '18 at 04:03
  • @JamesAntill No, that's not true either. Although things are more complicated; I'll edit my footnote to explain that. – abarnert Sep 10 '18 at 04:04
2

In general global should be avoided if possible.

The reason that people use the with command when dealing with files is that it explicitly controls the scope. Once the with operator is done the file is closed and the file variable is discarded.

You can avoid using the with operator but then you must remember to call myfile.close(). Particularly if you're dealing with a lot of files.

One way that avoids using the with block that also avoids using global is

 def filesave(f_obj, string):
     f_obj.write(string)

 f = open(filename, 'a')
 filesave(f, "test1\n")
 filesave(f, "test2\n")
 f.close()

However at this point you'd be better off getting rid of the function and just simply doing:

f = open(filename, 'a')
f.write("test1\n")
f.write("test2\n")
f.close()

At which point you could easily put it within a with block:

with open(filename, 'a') as f:
    f.write("test1\n")
    f.write("test2\n")

So yes. There's no hard reason to not do what you're doing. It's just not very Pythonic.

m4p85r
  • 402
  • 2
  • 17
1

The latter code may be more efficient, but the former code is safer because it makes sure that the content that each call to FileSave writes to the file gets flushed to the filesystem so that other processes can read the updated content, and by closing the file handle with each call using open as a context manager, you allow other processes a chance to write to the file as well (specifically in Windows).

blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Allowing other programs to open the file while you're writing to it intermittently is a great way to crash your program on Windows. – Mad Physicist Sep 10 '18 at 11:17
1

It really depends on the circumstances, but here are some thoughts:

A with block absolutely guarantees that the file will be closed once the block is exited. Python does not make and weird optimizations for appending files.

In general, globals make your code less modular, and therefore harder to read and maintain. You would think that the original FileSave function is attempting to avoid globals, but it's using the global name filename, so you may as well use a global file altogether at that point, as it will save you some I/O overhead.

A better option would be to avoid globals at all, or to at least use them properly. You really don't need a separate function to wrap file.write, but if it represents something more complex, here is a design suggestion:

def save(file, content):
    print(content, file=file)

def my_thing(filename):
    with open(filename, 'a') as f:
        # do some stuff
        save(f, 'test1')
        # do more stuff
        save(f, 'test2')

if __name__ == '__main__':
    my_thing('myfile.txt')

Notice that when you call the module as a script, a file name defined in the global scope will be passed in to the main routine. However, since the main routine does not reference global variables, you can A) read it easier because it's self contained, and B) test it without having to wonder how to feed it inputs without breaking everything else.

Also, by using print instead of file.write, you avoid having to spend newlines manually.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264