0

I am transferring a 150-200mb file to many locations (shared drives located across the world) daily. The issue is that each transfer (using shutil) takes probably 100-700 seconds and each one has to complete in order for the next one to begin. It now takes like a full hour to transfer some files if I do it that way. My temporary solution was to create a separate .py file to run for each location so they can be done simultaneously, but that is not ideal.

How can I dip into multi-thread programming? I'd like to run all of the transfers at once but I have zero experience with this.

A simple google search landed me with:

https://docs.python.org/3/library/concurrent.futures.html.

import shutil
with ThreadPoolExecutor(max_workers=4) as e:
    e.submit(shutil.copy, 'src1.txt', 'dest1.txt')
    e.submit(shutil.copy, 'src2.txt', 'dest2.txt')
    e.submit(shutil.copy, 'src3.txt', 'dest3.txt')
    e.submit(shutil.copy, 'src4.txt', 'dest4.txt')

Can someone point me into the right direction? I have been meaning to learn how to do things in parallel for a while now but never got around to it.

trench
  • 5,075
  • 12
  • 50
  • 80
  • Perhaps [this post](http://stackoverflow.com/questions/20887555/dead-simple-example-of-using-multiprocessing-queue-pool-and-locking) will help you. I've found the accepted answer to be a useful example. – Stephen B Sep 21 '16 at 17:47
  • It looks like you could also be renaming the files—it that correct? – martineau Sep 21 '16 at 18:09
  • That code snippet is from the docs I linked (when I used ctrl+F for shutil). I won't be changing the name though. shutil.copy2('../dashboard.twbx', '//sharedpath/region/dashboard/') – trench Sep 21 '16 at 18:11

1 Answers1

1

Here's a working example that does what you want. Note that it may not be any faster than one-at-a-time if the bottleneck is network bandwidth.

from concurrent.futures import ThreadPoolExecutor
import os
import shutil
import time
from threading import Lock

src_dir = './test_src'
src_files = 'src1.txt', 'src2.txt', 'src3.txt', 'src4.txt'
dst_dir = './test_dst'
print_lock = Lock()

_print = print  # save original
def print(*args, **kwargs):
    """Prevents concurrent printing."""
    with print_lock:
        _print(*args, **kwargs)

def copy_file(src_file):
    src_file = os.path.join(src_dir, src_file)
    print('starting transfer of "{}"'.format(src_file))
    shutil.copy2(src_file, dst_dir)
    print('transfer of "{}" completed'.format(src_file))

with ThreadPoolExecutor(max_workers=4) as e:
    jobs = [e.submit(copy_file, src_file) for src_file in src_files]

while any(job.running() for job in jobs):
    time.sleep(.1)
print('done')
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Hi, thanks. I tried this out and it copied the first file but not the second. It looked like the code was still running though and the print showed 'Starting transfer of filename' for both, but only one completed and then it printed 'done'. Also, is it possible to tweak this to focus on sharing one file to several destinations? – trench Sep 22 '16 at 18:27
  • Hmm, worked for me locally (not to/from locations on my LAN, but that shouldn't matter). If it printed `done`, then all the `shutil.copy2()` calls returned without any errors being raised. Perhaps `copy2` doesn't support multithreading (but I got away with using it because my test files were very small). To copy the same file to more than one place, change `copy_file()` to accept an additional `dst_dir` argument and then call it multiple times, once for each of the destinations. – martineau Sep 22 '16 at 18:42