Multithreaded file copy is far slower than a single thread on a multicore CPU

Question

I am trying to write a multithreaded program in Python to accelerate the copying of (under 1000) .csv files. The multithreaded code runs even slower than the sequential approach. I timed the code with profile.py. I am sure I must be doing something wrong but I'm not sure what.

The Environment:

Quad core CPU.
2 hard drives, one containing source files. The other is the destination.
1000 csv files ranging in size from several KB to 10 MB.

The Approach:

I put all the file paths in a Queue, and create 4-8 worker threads pull file paths from the queue and copy the designated file. In no case is the multithreaded code faster:

sequential copy takes 150-160 seconds
threaded copy takes over 230 seconds

I assume this is an I/O bound task, so multithreading should help the operation speed.

The Code:

    import Queue
    import threading
    import cStringIO 
    import os
    import shutil
    import timeit  # time the code exec with gc disable
    import glob    # file wildcards list, glob.glob('*.py')
    import profile # 

    fileQueue = Queue.Queue() # global
    srcPath  = 'C:\\temp'
    destPath = 'D:\\temp'
    tcnt = 0
    ttotal = 0

    def CopyWorker():
        while True:
            fileName = fileQueue.get()
            fileQueue.task_done()
            shutil.copy(fileName, destPath)
            #tcnt += 1
            print 'copied: ', tcnt, ' of ', ttotal

    def threadWorkerCopy(fileNameList):
        print 'threadWorkerCopy: ', len(fileNameList)
        ttotal = len(fileNameList)
        for i in range(4):
            t = threading.Thread(target=CopyWorker)
            t.daemon = True
            t.start()
        for fileName in fileNameList:
            fileQueue.put(fileName)
        fileQueue.join()

    def sequentialCopy(fileNameList):
        #around 160.446 seconds, 152 seconds
        print 'sequentialCopy: ', len(fileNameList)
        cnt = 0
        ctotal = len(fileNameList)
        for fileName in fileNameList:
            shutil.copy(fileName, destPath)
            cnt += 1
            print 'copied: ', cnt, ' of ', ctotal

    def main():
        print 'this is main method'
        fileCount = 0
        fileList = glob.glob(srcPath + '\\' + '*.csv')
        #sequentialCopy(fileList)
        threadWorkerCopy(fileList)

    if __name__ == '__main__':
        profile.run('main()')

You say "I assume it's I/O bound" then say "multithreading should help operation speed." You're looking at it wrong. I/O bound means that it's bound by I/O, not CPU. If it was CPU bound, multithreading would help. — Mike Bailey, Dec 21 '11 at 03:40
Of course it's slower - do you have a set of hard-drive heads for each thread you're spawning? — wim, Dec 21 '11 at 03:55

score 10 · Answer 1 · answered Dec 21 '11 at 03:39

Of course it's slower. The hard drives are having to seek between the files constantly. Your belief that multi-threading would make this task faster is completely unjustified. The limiting speed is how fast you can read data from or write data to the disk, and every seek from one file to another is a loss of time that could have been spent transferring data.

score 2 · Answer 2 · answered Aug 05 '17 at 21:00

I think I can verify that it is a disk I/O situation. I did a similar test on my machine, copying from an extremely fast network server back onto itself and I saw nearly a 1:1 speed increase just using your code above (4 threads). My test was copying 4137 files totaling 16.5G:

Sequential copy was 572.033 seconds.
Threaded (4) copy was 180.093 seconds.
Threaded (10) copy was 110.155
Threaded (20) copy was 86.745
Threaded (40) copy was 87.761

As you can see there is a bit of a "falloff" as you get into higher and higher thread counts, but at 4 threads I had a huge speed increase. I'm on a VERY fast computer with a very fast network connection, so I think I can safely assume that you are hitting an I/O limit.

That said, check out the resonse I got here: Python multiprocess/multithreading to speed up file copying. I haven't had a chance to try this code out yet but it is possible that gevent could be faster.

Spencer

I could also verify this on a 1Gig fast ethernet network by copying 500Mb data from one filers location to other. Sequential copy took 45 secs vs 16 worker threads based copy using above code took 17.4 secs. Threading based copy is showing perf boost for I/O bound. — sudheer, Jul 29 '20 at 19:06

hannes.koller · Answer 3 · 2012-06-26T08:33:28.000

0

as an aside I just wanted to add that the above code is slightly wrong. You should call fileQueue.task_done() AFTER shutil.copy(fileName, destPath) .. otherwise the last files will not be copied :)

edited Jun 26 '12 at 08:33

answered Jun 25 '12 at 14:41

hannes.koller

351
1
5

score 0 · Answer 4 · answered Jan 03 '18 at 04:04

There exist cpu bounded applications and i/o bounded applications, normally you can get almost linear benefit from multithread app when its sequential version is cpu bounded. But when you are i/o bounded you are going to gain nothing, many operating systems can show you the "busy time percentage" of you CPU and the "disk busy time percentage" too, that way you can know which is your case.

BUT, due to normally the sequential code is not async, you end fetching one file, and then waiting for that file copy, then next file. That way you avoid the Operating system have the list of files and prioritize the read requests based on surface disk location.

Conclusion: if you look for maximum performance go single-thread but using Async APIs to allow the OS schedule the read requests better.

score 0 · Answer 5 · answered Jan 23 '20 at 20:31

The multi-threaded I/O approach is only advantageous when the transfer is over a high-latency TCP connection, where TCP windowing will limit the throughput over a single TCP connection. Creating multiple TCP connections between the source and target, and interleaving the files across those connections (which requires multithreading), CAN perform much better than a standard FTP or NFS copy, there are programs like NetApp's XCP that do this exact thing. If your latency is low or the copy is local, then the only efficiencies you can look to achieve is to get around whatever the file system is doing from a bottleneck situation (i.e. millions of files, etc), and that answer to that is not what you're doing.

score 0 · Answer 6 · answered Dec 21 '11 at 03:43

I assume this is more a I/O bound task, multithread should help the operation speed, anything wrong with my approach?!

Yes.

Too many punctuation marks. Just one. "?" is appropriate.
Your assumption is wrong. Multithreaded helps CPU bound (sometimes). It can never help I/O bound. Never.

All threads in a process must wait while one thread does I/O.

or coroutine to do the job?!

No.

If you want to do a lot of I/O, you need a lot of processes.

If you're copying 1000 files, you need many, many processes. Each process copies some of the files.

Technically multithreading can help in some I/O bound instances. If the I/O is high latency (such as a web request), multiple threads of execute can help. But generally, I do agree with you. — Mike Bailey, Dec 21 '11 at 03:45

Multithreaded file copy is far slower than a single thread on a multicore CPU

6 Answers6

Linked