4

I am trying to download large file from python requests library by setting the stream=True

But i want this function to be executed asynchronously and send response back to server with downloading in the background.

Here is my code

async def downloadFile(url, filename):
  r = requests.get(url, stream=True)
  with open(os.path.join('./files', filename), 'wb+') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)
  # Creating same file name
  # with _done appended to know that file has been downloaded
  with open(os.path.join('./files', filename + '_done'), 'w+') as f: 
    f.close()
  await asyncio.sleep(1)

Calling this function from other function like this

# check if file exist in server
        if(os.path.exists(os.path.join('./files', fileName))):

            #file exist!!!

            #check if done file exist
            if(os.path.exists(os.path.join('./files', fileName + '_done'))):

                #done file exist
                self.redirect(self.request.protocol + "://" +
                              self.request.host + '/files/' + fileName)
            else:
                #done file not exist. Wait for 5 min more

                self.write('Wait 5 min')
                self.finish()
        else:
            # file doesnt exist. Initiate download
            self.write('Wait 5 min')
            self.finish()
            d = asyncio.ensure_future(downloadFile(
                fileRes, fileName))
            # loop = asyncio.get_event_loop()
            # loop.run_until_complete(d)

The problem is that the file is created but its size remains 0 and the file appended "_done" is never created. What am I doing wrong here?

Shahrukh Shahid
  • 418
  • 4
  • 16
  • Your code works for me. Perhaps [enable debugging for requests](https://stackoverflow.com/a/16630836/6085135)? – brennan Dec 20 '18 at 17:21
  • Your code works for me too – zmo Dec 20 '18 at 17:21
  • This code works for the first time only. If i instantly initiate the request it waits for the whole file to download. I have updated my question with details about how download function is called. please check. – Shahrukh Shahid Dec 20 '18 at 18:52

1 Answers1

3

Your code works for me. Maybe is it the resource you're trying to get that does not work.

You might want to try enabling debug for requests as suggested by @brennan, and/or add printouts to your code to follow what's happening:

>>> import requests
>>> import asyncio
>>> 
>>> 
>>> async def downloadFile(url, filename):
...   print(f"• downloadFile({url}, {filename})")
...   r = requests.get(url, stream=True)
...   print(f" → r: {r}")
...   with open(os.path.join('./files', filename), 'wb+') as f:
...     print(f" → f is opened: {f}")
...     for chunk in r.iter_content(chunk_size=1024):
...         print(f"  → chunk is: {chunk}")
...         if chunk:
...             f.write(chunk)
...   # Creating same file name
...   # with _done appended to know that file has been downloaded
...   with open(os.path.join('./files', filename + '_done'), 'w+') as f:
...     print(f" → creating output with _done")
...     f.close()
...   print(f" → wait 1")
...   await asyncio.sleep(1)
... 
>>> 
>>> 
>>> d = asyncio.ensure_future(downloadFile('https://xxx/yyy.jpg', 'test.jpg'))
>>> loop = asyncio.get_event_loop()
>>> loop.run_until_complete(d)
• downloadFile(https://xxx/yyy.jpg, test.jpg)
 → r: <Response [200]>
 → f is opened: <_io.BufferedRandom name='./files/test.jpg'>
  → chunk is: b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\r\t\n\x0b\n\x08\r\x0b\n\x0b\x0e\x0e\r\x0f\x13....'
  → chunk is: ...
  ...
 → creating output with _done
 → wait 1

that would make your _done part of your code useless (you only need the printouts). And even the wait at the end (when it's done… it's done!).

async def downloadFile(url, filename):
  r = requests.get(url, stream=True)
  with open(os.path.join('./files', filename), 'wb+') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)

Though maybe you might want to catch any possible issues happening connecting to the server and act accordingly:

async def downloadFile(url, filename):
  try:
    r = requests.get(url, stream=True)
    r.raise_for_status() # to raise on invalid statuses
    with open(os.path.join('./files', filename), 'wb+') as f:
      for chunk in r.iter_content(chunk_size=1024):
          if chunk:
              f.write(chunk)
  except requests.RequestException as err:
    # do something smart when that exception occurs!
    print(f"Exception has occured: {err}")
zmo
  • 24,463
  • 4
  • 54
  • 90
  • Actually this works for the first time only. But if i instantly request again it waits for response until the file has been downloaded. Here is the work flow 1- Script search for file in script. If found it returns, if not it starts to download in server after sending response to user "wait 5 min" 2- If file has been download and there exist "_done" file. Then it send the file to user requesting. 3- If file exist but not "_done" file. Then the file must be downloading in server. and send response back to user to wait 5 min. – Shahrukh Shahid Dec 20 '18 at 18:45
  • I have updated my question with more details how downloading is initiated. – Shahrukh Shahid Dec 20 '18 at 18:51
  • ok, TBH, I don't understand what your requirements / what you're trying to achieve… I think you're in a deep case of [X-Y Problem](https://xyproblem.org). It looks like you're trying to circumvent issues downloading a file by brute force downloading that file until eventually it works. But you can achieve that more elegantly without using the `_done` file, by using `requests` statuses (cf the last example in my answer). But the right way depends mainly on why you need watching the file and redownloading. – zmo Dec 21 '18 at 13:21