3

Copying a file in Python3 using the following code takes a lot of time:

shutil.copy(self.file, self.working_dir)

However, the cp command of Linux is pretty fast. If I try to execute the bash command from Python3 for copying files with sizes greater than 100GB, will that be a reliable option for production servers?

I have seen this answer but its suggestions are not fast.

Aviral Srivastava
  • 4,058
  • 8
  • 29
  • 81
  • One possible issue you might encounter using `shutil.copy()` is that on POSIX platforms, file owner and group, aswell as ACLs [will be lost](https://docs.python.org/3/library/shutil.html) during transfer. `shutil.copyfile()` should suit your needs just fine. Have you ran any benchmarks and seen actual performance degradation? – hyperTrashPanda Feb 04 '19 at 12:07
  • 1
    Another thing to consider is that for a 100GB file, using the main thread may block a lot and that might not be what you want, so a new thread or a subprocess would be pretty much required to background the large copy task. – Sam Rockett Feb 04 '19 at 12:14

1 Answers1

2

If you are running on Windows, Python's copy buffer size may be too small: https://stackoverflow.com/a/28584857/679240

You would need to implement something similar to this (warning: untested):

def copyfile_largebuffer(src, dst, length=16*1024*1024):
    with open(newfile, 'wb') as outfile, open(oldfile, 'rb') as infile:
        copyfileobj_largebuffer(infile, outfile, length=length)

def copyfileobj_largebuffer(fsrc, fdst, length=16*1024*1024):
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)
Haroldo_OK
  • 6,612
  • 3
  • 43
  • 80