12

I have a file to download (download path extracted from json. eg: http://testsite/abc.zip).

I need a help to perform, all the 5 threads should download the abc.zip file to the output directory and the download has to be Asynchronous or concurrent.

Currently with the below code it does download the file 5 times but it downloads one by one (Synchronous).

What I want is, the download to be simultaneous.

def dldr(file=file_url, outputdir=out1):
    local_fn = str(uuid.uuid4())
    if not os.path.exists(outputdir):
        os.makedirs(outputdir)
    s = datetime.now()
    urllib.urlretrieve(file, outputdir + os.sep + local_fn)
    e = datetime.now()
    time_diff = e - s
    logger(out1, local_fn, time_diff)

for i in range(1, 6):
    t = threading.Thread(target=dldr())
    t.start()

I have read Requests with multiple connections post and it's helpful, but doesn't address the requirement of the question asked.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
kmhussain
  • 123
  • 1
  • 1
  • 7
  • 1
    so what's your question? – user2266449 Nov 05 '15 at 10:19
  • The code above will download a file to a location. How to implement multi-Threading here to download the file simultaneously by 5 agents? – kmhussain Nov 05 '15 at 11:56
  • Oh you want single file download multi threaded: Than it is duplicate question: http://stackoverflow.com/questions/9701682/download-a-single-file-using-multiple-threads – cvakiitho Nov 06 '15 at 14:27
  • Possible duplicate of [Requests with multiple connections](http://stackoverflow.com/questions/13973188/requests-with-multiple-connections) – cvakiitho Nov 06 '15 at 14:30
  • The examples give an insight on downloading a single file based on range by multiple threads. What i would require is to download the single file by multiple threads as whole file without any range. Eg: 5 threads downloading the same file 5 times concurrently. The current edited code does download the file 5 times but it downloads one by one. I want the download to be simultaneous. – kmhussain Nov 07 '15 at 08:36
  • In That case my answer will work that way, you will just need to pick another name for next file. Let me edit that a bit. – cvakiitho Nov 09 '15 at 09:22
  • 1
    It did solve my requirement. Thanks a lot @cvakiitho !!! – kmhussain Nov 09 '15 at 10:29

1 Answers1

12

I use threading module for download threads:
Also requests, but you can change that to urllib by yourself.

import threading
import requests

def download(link, filelocation):
    r = requests.get(link, stream=True)
    with open(filelocation, 'wb') as f:
        for chunk in r.iter_content(1024):
            if chunk:
                f.write(chunk)

def createNewDownloadThread(link, filelocation):
    download_thread = threading.Thread(target=download, args=(link,filelocation))
    download_thread.start()

for i in range(0,5):
    file = "C:\\test" + str(i) + ".png"
    print file
    createNewDownloadThread("http://stackoverflow.com/users/flair/2374517.png", file)
Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
cvakiitho
  • 1,377
  • 13
  • 27
  • 3
    Why we should downloading by chunks and not just f.write(r.content)? – GarfieldCat Dec 15 '19 at 22:33
  • 7
    @GarfieldCat, because downloading without using chunks will read the whole file into memory, which will cause memory problems for very large files (think > 1Gb). With a chunk, only the "chunk" is read into memory per time. – olumidesan Dec 26 '19 at 22:27