13

I've been taught the best way to read a file in python is to do something like:

with open('file.txt', 'r') as f1:
    for line in f1:
        do_something()

But I have been thinking. If my goal is to copy the contents of one file completely to another, are there any dangers of doing this:

with open('file2.txt', 'w+') as output, open('file.txt', 'r') as input:
    output.write(input.read())

Is it possible for this to behave in some way I don't expect?

Along the same lines, how would I handle the problem if the file is a binary file, rather than a text file. In this case, there would be no newline characters, so readline() or for line in file wouldn't work (right?).

EDIT Yes, I know about shutil. There are many better ways to copy a file if that is exactly what I want to do. I want to know about the potential risks, if any, of this approach specifically, because I may need to do more advanced things than simply copying one file to another (such as copying several files into a single one).

ewok
  • 20,148
  • 51
  • 149
  • 254
  • You can use multiple context managers in a single line, you know? – jonrsharpe Apr 26 '16 at 20:41
  • if your goal is to copy contents of a file, use https://docs.python.org/2/library/shutil.html#shutil.copyfile – nathan.medz Apr 26 '16 at 20:41
  • @nathan.meadows I'm imagining a situation where I have to do something more complicated, such as copy several files into 1, for example – ewok Apr 26 '16 at 20:45
  • 2
    `with open('file2.txt', 'w+') as output, open('file.txt', 'r') as input:` – jonrsharpe Apr 26 '16 at 20:45
  • 1
    If you just want to copy one file to another, you could do: `from shutil import copyfile; copyfile('file1.txt', 'file2.txt')`. To concatenate multiple text files, please check out this thread: http://stackoverflow.com/questions/13613336/python-concatenate-text-files – Quinn Apr 26 '16 at 21:05

3 Answers3

16

Please note that the shutil module also contains copyfileobj(), basically implemented like Barmar's answer.

Or, to answer your question:

from shutil import copyfileobj

with open('file2.txt', 'wb') as output, open('file.txt', 'rb') as input:
    copyfileobj(input, output)

would be my suggestion. It avoids re-implementing the buffering mechanism and, should the implementation of the standard library improve, your code wins as well.


On Unix, there also is a non-standardised syscall called sendfile. It is used mostly for sending data from an open file to a socket (serving HTTP requests, etc.).

Linux allows using it for copying data between regular files as well though. Other platforms don't, check the Python doc and your man pages. By using a syscall the kernel copies the content without the need of copying buffers to and from userland.

The os module offers os.sendfile() since Python 3.3. You could use it like:

import io
import os

with open('file2.txt', 'wb') as output, open('file.txt', 'rb') as input:
    offset = 0 # instructs sendfile to start reading at start of input
    input_size = input.seek(0, io.SEEK_END)
    os.sendfile(output.fileno(), input.fileno(), offset, input_size)

Otherwise, there is a package on PyPi, pysendfile, implementing the syscall. It works exactly as above, just replace os.sendfile with sendfile.sendfile (and import sendfile).

Seoester
  • 1,116
  • 14
  • 18
  • Be aware, that this is platform specific! sendfile syscall normally only supports writing to sockets (TCP for example), not for ordinary file handles. – user582175 Apr 17 '20 at 13:04
9

The only potential problem with your output.write(input.read()) version is if the size of the file is too large to hold all of it in memory. You can use a loop that reads smaller batches.

with open('file2.txt', 'wb+') as output, open('file.txt', 'rb') as input:
    while True:
        data = input.read(100000)
        if data == '':  # end of file reached
            break
        output.write(data)

This will work for both text and binary files. But you need to add the b modifier to the modes for portable operation on binary files.

Barmar
  • 741,623
  • 53
  • 500
  • 612
4

While this may not completely answer your question, but for plain copying without any other processing of file contents, you should consider other means, e.g. the shutil module:

shutil.copy('file.txt', 'file2.txt')
user2390182
  • 72,016
  • 6
  • 67
  • 89