30

What is the fastest way to copy files in a python program?

It takes at least 3 times longer to copy files with shutil.copyfile() versus to a regular right-click-copy > right-click-paste using Windows File Explorer or Mac's Finder. Is there any faster alternative to shutil.copyfile() in Python? What could be done to speed up a file copying process? (The files destination is on the network drive... if it makes any difference...).

EDITED LATER:

Here is what I have ended up with:

def copyWithSubprocess(cmd):        
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

win=mac=False
if sys.platform.startswith("darwin"):mac=True
elif sys.platform.startswith("win"):win=True

cmd=None
if mac: cmd=['cp', source, dest]
elif win: cmd=['xcopy', source, dest, '/K/O/X']

if cmd: copyWithSubprocess(cmd)
Dreams
  • 5,854
  • 9
  • 48
  • 71
alphanumeric
  • 17,967
  • 64
  • 244
  • 392
  • You can use the native command line options like `cp` for Linux & Mac and `COPY` for Windows. They should be as fast as when you use the GUI. – Ecno92 Feb 27 '14 at 19:48
  • On Windows SHFileOperation gives you the native shell file copy – David Heffernan Feb 27 '14 at 19:50
  • Depending on some factors not stated in the question it could be beneficial to pack the files into a compressed archive before transmission... Have you considered using something like rsync? – moooeeeep Feb 27 '14 at 19:54
  • 1
    If you are concerned with ownership and ACL don't use shutil for that reason alone: ['On Windows, file owners, ACLs and alternate data streams are not copied.](http://docs.python.org/2/library/shutil.html?highlight=shutil#shutil) ' – Michael Burns Feb 27 '14 at 20:02
  • If I use the native operation system's commands (such as OSX cp) should I be then using subprocess? Is there any Python module to call cp directly without a need for a subprocess on Mac? – alphanumeric Feb 27 '14 at 20:04
  • 4
    It's worth noting that in Python 3.8 functions that copy files and directories [have been optimized](https://docs.python.org/3.8/library/shutil.html#shutil-platform-dependent-efficient-copy-operations) to work faster on several major OS. – Morwenn Jul 12 '18 at 09:41

4 Answers4

19

The fastest version w/o overoptimizing the code I've got with the following code:

class CTError(Exception):
    def __init__(self, errors):
        self.errors = errors

try:
    O_BINARY = os.O_BINARY
except:
    O_BINARY = 0
READ_FLAGS = os.O_RDONLY | O_BINARY
WRITE_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC | O_BINARY
BUFFER_SIZE = 128*1024

def copyfile(src, dst):
    try:
        fin = os.open(src, READ_FLAGS)
        stat = os.fstat(fin)
        fout = os.open(dst, WRITE_FLAGS, stat.st_mode)
        for x in iter(lambda: os.read(fin, BUFFER_SIZE), ""):
            os.write(fout, x)
    finally:
        try: os.close(fin)
        except: pass
        try: os.close(fout)
        except: pass

def copytree(src, dst, symlinks=False, ignore=[]):
    names = os.listdir(src)

    if not os.path.exists(dst):
        os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignore:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                copyfile(srcname, dstname)
            # XXX What about devices, sockets etc.?
        except (IOError, os.error), why:
            errors.append((srcname, dstname, str(why)))
        except CTError, err:
            errors.extend(err.errors)
    if errors:
        raise CTError(errors)

This code runs a little bit slower than native linux "cp -rf".

Comparing to shutil the gain for the local storage to tmfps is around 2x-3x and around than 6x for NFS to local storage.

After profiling I've noticed that shutil.copy does lots of fstat syscals which are pretty heavyweight. If one want to optimize further I would suggest to do a single fstat for src and reuse the values. Honestly I didn't go further as I got almost the same figures as native linux copy tool and optimizing for several hundrends of milliseconds wasn't my goal.

Dmytro
  • 402
  • 3
  • 4
  • 2
    Not sure if this is specific to later versions of python (3.5+), but the sentinel in `iter` needs to be `b''` in order to stop. (At least on OSX) – muppetjones Mar 10 '17 at 18:50
  • doesn't work with python 3.6 even after fixing 'except' syntax – Vasily Dec 06 '17 at 12:42
  • This is a python2 version. I haven't tested this with python3. Probably python3 native filetree copy is fast enough, one shall do a benchmark. – Dmytro Dec 07 '17 at 17:15
  • This is way faster than anything else I tried. – Soumyajit Jan 16 '18 at 12:30
  • FYI I would suggest adding `shutil.copystat(src, dst)` after the file is written out to bring over metadata. – Spencer Jul 14 '18 at 22:27
  • @Dmytro Just tested in py3.6, still 3x faster than shutil.copy2()! – Spencer Oct 18 '18 at 21:08
  • @Spencer What did you changed in the code to make it work with python 3.5+ ?, Best regards! – Varlor Nov 14 '18 at 09:43
  • @Varlor From what I remember I just changed this one line to the following: `for x in iter(lambda: os.read(fin, BUFFER_SIZE), b""):` (made the string into bytes by adding 'b') – Spencer Nov 14 '18 at 21:12
  • ah, because I get also a syntax error with the 'why' in the except line 'except (IOError, os.error), why:' – Varlor Nov 15 '18 at 07:21
  • To boost performance also see the [pyfastcopy](https://pypi.org/project/pyfastcopy/) module which uses the system call sendfile(). This module works for Python 2 and 3. All you have to do is literally "import pyfastcopy" and shutils will automatically perform better. Like @Morwenn mentioned above, Python 3.8 will have sendfile() built into its impl. – Vahid Pazirandeh Dec 12 '18 at 11:29
  • I've tried to replace `shutil.copyfile` by the `copyfile` from the example under python `3.8.0`, but it seems hanged in the `for x in iter` loop. – Andry Nov 10 '19 at 15:27
  • @VahidPazirandeh `pyfastcopy` does not work on Windows: *The ``sendfile`` system call does not exist on Windows, so importing this module will have no effect.* – Andry Nov 10 '19 at 15:37
  • For those trying to get the best performance, buffer size is VERY important https://www.zabkat.com/blog/buffered-disk-access.htm – Spencer May 03 '22 at 19:52
6

You could simply just use the OS you are doing the copy on, for Windows:

from subprocess import call
call(["xcopy", "c:\\file.txt", "n:\\folder\\", "/K/O/X"])

/K - Copies attributes. Typically, Xcopy resets read-only attributes
/O - Copies file ownership and ACL information.
/X - Copies file audit settings (implies /O).

Michael Burns
  • 630
  • 2
  • 5
  • 10
  • Would "xcopy" on Windows work with a "regular" subprocess, such as: cmd = ['xcopy', source, dest, "/K/O/X"] subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) – alphanumeric Feb 27 '14 at 20:50
  • That will work as well. – Michael Burns Feb 27 '14 at 20:55
  • Invalid number of parameters error –  Aug 18 '17 at 11:00
  • Note that the /O and /X flags require an elevated subprocess, else you will result in "Access denied" – Rexovas Dec 13 '20 at 21:56
  • This is a very fast option for single-file copies, but for anyone out there trying to thread large numbers of files it can run far slower (9x in a recent test of 4000 files). I was able to get better results modifying copy2 buffer size as in other answer. – Spencer May 03 '22 at 19:53
2
import sys
import subprocess

def copyWithSubprocess(cmd):        
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

cmd=None
if sys.platform.startswith("darwin"): cmd=['cp', source, dest]
elif sys.platform.startswith("win"): cmd=['xcopy', source, dest, '/K/O/X']

if cmd: copyWithSubprocess(cmd)
alphanumeric
  • 17,967
  • 64
  • 244
  • 392
0

this is just a guess but ... your timing it wrong ... that is when you copy the file it opens the file and reads it all into memory so that when you paste you only create a file and dump your memory contents

in python

copied_file = open("some_file").read()

is the equivelent of the ctrl + c copy

then

with open("new_file","wb") as f:
     f.write(copied_file)

is the equivelent of the ctrl + v paste (so time that for equivelency ....)

if you want it to be more scalable to larger data (but its not going to be as fast as ctrl+v /ctrl+c

with open(infile,"rb") as fin,open(outfile,"wb") as fout:
     fout.writelines(iter(fin.readline,''))
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179