How can you get real time copy progress of a large file with Python?

Question

I've searched high and low, and each time I find something that looks promising it's not panned out.

Ultimately I want to grab the real time progress of a file copy on a linux machine from inside python. I'll take that progress and emit it to a client web page with Flask-SocketIO, likely threaded to avoid blocking.

I don't mind if it's rsync, copy, or any other means...(shutil etc) to handle the actual copy. I just want a hook to push an update over the socket.

Thus far I've found this to be the most promising. However, I'm not quite grasping it's console printing mechanism, because when I try to print output to a file, or just a regular Python print, it comes out one character at a time.

import subprocess
import sys

def copy_with_progress(src, dst):
    cmd = 'rsync --progress --no-inc-recursive %s %s'%(src, dst)
    sub_process = subprocess.Popen(cmd, close_fds=True, shell=True, stdout=subproces.PIPE, stderr=subprocess.PIPE)
    while sub_process.poll() is None:
        out = sub_process.stdout.read(1)
        sys.stdout.write(out)
        sys.stdout.flush()


src = '/home/user/Downloads/large_file.tar'
dst = '/media/usbdrive/large_file.tar'

copy_with_progress(src, dst)

Which came from this SO question: Getting realtime output using subprocess

However, this reports the output back over stdout. I'd like capture this output in a variable and emit it.

The stdout progress looks like this, with one line being updated constantly: large_file.tar 323,780,608 19% 102.99MB/s 0:00:12 When I print the variable named 'out' I get a single character that prints to the screen cycling a new line over and over.

How do I capture this info in a way that's useable for transmitting to client side?

Is there a way to grab the entire line for each refresh of the status?

You can just `stat` the source file to get the total size, then periodically `stat` the destination file to get the current size as long as the subprocess is running. You might look at the `tqdm` package (in manual mode) for the user interface. — o11c, May 23 '16 at 22:41
You are reading one byte at a time so that is what you would expect to see — Padraic Cunningham, May 23 '16 at 23:05
How would you read all available bytes for each iteration of the loop instead? — Kenny Powers, May 24 '16 at 02:03
It is a bit faster to use `sub_process.stdout.readline()` instead of `sub_process.stdout.read(1)` — Agile Bean, Jan 02 '21 at 19:27

score 1 · Accepted Answer · answered Jun 02 '16 at 01:58

What I've done in the past is to copy the data in chunks and use a callback function to monitor the progress. Something like:

# Python_2

def copy_with_callback(sourceFile, destinationFile, callbackFunction):
    chunk = 4*1024
    sourceSize = os.path.getsize(sourceFile)
    destSize = 0
    with open(sourceFile, 'rb') as fSrc:
        with open(destinationFile, 'wb') as fDest:
            data = fSrc.read(chunk)
            if len(data) == 0:
                break
            fDest.write(data)
            destSize += len(data)
            callbackFunction(sourceSize, destSize)

def example_callback_function(srcSize, dstSize):
    ''' Just an example with print.  Your viewer code will vary '''
    print 'Do something with these values:', srcSize, dstSize
    print 'Percent?', 100.0 * dstSize / srcSize

def main():
    src = '/tmp/A/path/to/a/file.txt'
    dest = '/tmp/Another/path/to/a/file.txt'
    copy_with_callback(src, dest, example_callback_function)

An advantage is this python code doesn't depend upon OS specific functionality.

How can you get real time copy progress of a large file with Python?

1 Answers1

Linked