2

I am reading Zed Shaw's book about Python.

There is an exercise asking to implement copy from one file to another in smallest number of lines. I did it in one line, here is it:

 (open(argv[2], 'w')).write(open(argv[1]).read())

My question is, what worst things could happen if I do it in this way? I mean without closing them.

I did the same thing in another way, tutorial says that this is more robust approach. What do you think?

with open(argv[1], 'r') as source:
   with open(argv[2], 'w') as dest:
       dest.write(source.read())

I am a newcomer, this kind of questions may look silly, but they are important for me.

Thank you for your attention. Nick

SuperManEver
  • 2,263
  • 6
  • 28
  • 36
  • I guess if all you care about is read, then nothing bad will happen if you don't close since it will close anyway after you end the interpreter. If you were interested in writing, then not closing might be bad. –  Jan 17 '15 at 15:46

1 Answers1

5

In most current implementations of Python, in practice, files are closed as soon as the file object is dereferenced, because garbage collection is mostly by reference count.

However, the Python language does not guarantee that: you're relying on an implementation detail.

Thus, the one-liner will break in current or future Python implementations that use more advanced garbage collection techniques (e.g delegating GC to an underlying JVM or .Net runtime).

with does guarantee closure as soon as the block exits -- in any correct implementation of the language. As such, it's definitely more robust and error-proof.

Furthermore, suppose the file named by argv[2] exists, but the one named by arg[1] doesn't. In the one-liner:

(open(argv[2], 'w')).write(open(argv[1]).read())

you first open the to-be-written file (thus wiping out its contents) -- with redundant parentheses BTW, but they're innocuous:-). Then in you try to open the to-be-read file, get an exception, and fail -- but the first file you opened is irretrievably wiped out anyway (i.e, left empty on disk). This is unlikely to be the desired behavior in this case.

In the with variant, you first try to open the file to be read -- if that fails, you never wipe out the file to be written. That feels like more robust behavior to me, too -- and this applies to any version of Python, past, present, or future:-).

And, one more thing: I wonder if the specs assert that the file contents fit in memory. If they don't, trying to read the file in one gulp will fail with a memory error -- rather, you'd want a loop reading and writing some large but bounded BUFFER_SIZE bytes at a time. Which underscores that for copying purposes it's better to open the files in binary, rather than text mode (text mode is the default so it's used in your code, as you don't specify otherwise). In this vein, more robust is:

with open(argv[1], 'rb') as source:
    with open(argv[2], 'wb') as dest:
        while True:
            buf = source.read(BUFFER_SIZE)
            if not buf: breal
            dest.write(buf)

Funny how many little details can go wrong with just copying a file, hm?-) Which is why the key to Python bliss is learning Python's large standard library, full of modules carefully coded to take care of all the various corner cases that can arise even in the simplest task.

And the real answer to the question (the one I'd give an A+ were I interviewing a job candidate claiming mastery of Python) is (drum roll...):

import shutil

shutil.copy2(argv[1], argv[2])

!-) See the docs at https://docs.python.org/2/library/shutil.html#shutil.copy2 (and just above at the copy and copystat functions in the same module): this safely and securely copies "permission bits, last access time, last modification time, and flags"... there's more to file copying that strikes the eye, and shutil takes care of it all on your behalf.

Learning the Python standard library is at least as important as learning the language itself!-)

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 1
    PyPy, for example, _does not_ automatically close files on dereference, because it uses garbage collection for everything. i've filed a bug on a very popular library that crashed on PyPy because it did `open(...).read()` in a tight loop and ran out of file descriptors! – Eevee Jan 18 '15 at 01:25
  • @Eevee, excellent point: PyPy is indeed an important implementation (as are IronPython / Python.Net, and Jython, for their own niches) and one thing they all have in common is that they use better GC approaches than the old reference-counting mainstay of CPython. GC is about recovering **memory** and one shouldn't count on it for other resources, such as file descriptors! Whence the importance of `with` for guaranteed finalization of all kinds of resources (FDs most often, but DB connections, locks, and umpteen others, are right up there too!-) – Alex Martelli Jan 18 '15 at 01:55
  • great and exhaustive explanation. thanks a lot – SuperManEver Jan 23 '15 at 18:42