0

How can I view the current progress of this request? Nothing is shown until the file completes and I would like to set some sort of indicator if this request is still active.

import requests

with open('file.txt', 'r') as f:
    urls = f.readlines()

datalist=[]
for url in urls:
    data = requests.get(url)
    datalist.append(data.text)

with open('file_complete.txt', 'w') as f:
    for item in datalist:
        f.write("%s\n" % item)
mjbaybay7
  • 99
  • 5
  • You can add a `print()` statement before the `requests.gets(url)` and after `datalist.append(data.text)`. At least you can track the progress by URL. – Timothy Wong Aug 07 '20 at 03:46
  • If you want the progress in the file to follow as well, you should nest the `with` statement in the `for` loop -- that way the results of each `requests.get(url)` will be written to the file every time it successfully `gets` the `url` (hint: if you do that you no longer need `datalist`) – Timothy Wong Aug 07 '20 at 03:48
  • @TimothyWong Can you please explain this a bit more? I'm not understanding. Thanks! – mjbaybay7 Aug 07 '20 at 03:48
  • I'll post as an answer for better clarity – Timothy Wong Aug 07 '20 at 03:49
  • Are you downloading some big file? – Mooncrater Aug 07 '20 at 05:47
  • @Mooncrater Yes I am downloading a big file – mjbaybay7 Aug 07 '20 at 14:40
  • @mjebay7 I think you want to know how much of the file is downloaded. Look at `iter_content` functionality of `requests`. You can find more [here](https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests) – Mooncrater Aug 07 '20 at 17:24

2 Answers2

1

requests.get() is a blocking call. If you would like to have a bit more control you could send your requests in individual threads. You could also add timeouts if that is of concern. But no, there is no way to check progress of an in-progress get request.

gesingle
  • 126
  • 1
  • 1
  • 5
0

You can add a print() statement before the requests.gets(url) and after datalist.append(data.text). At least you can track the progress by URL.

for url in urls:
    print("Getting " + url)
    data = requests.get(url)
    datalist.append(data.text)
    print(url + " successfully downloaded")

Your code, however, only writes to the file once all URLs have been downloaded. If the program fails at any point file_complete.txt will not be created. So I suggest writing to the file once any URL download is successful.

import requests

with open('file.txt', 'r') as f:
    urls = f.readlines()

# datalist=[]  // No longer needed
for url in urls:
    data = requests.get(url)

    with open('file_complete.txt', 'a+') as f:   #change to mode "a+" to append
        f.write(data.text + "\n")

Another improvement that can be made -- your code assumes that ALL URLs are valid. We can use a try-except block to catch errors.

import requests

with open('file.txt', 'r') as f:
    urls = f.readlines()

# datalist=[]  // No longer needed
for url in urls:
    try:
        data = requests.get(url)
    except:
        printf(url + " failed")
        continue   #moves on to the next url as nothing to write to file

    with open('file_complete.txt', 'a+') as f:   #change to mode "a+" to append
        f.write(data.text + "\n")

Timothy Wong
  • 689
  • 3
  • 9
  • 28