Download files asynchronously with multiple connections using Python

Question

I want to know how to write a Python download manager that downloads multiple files concurrently, each with multiple connections (32 connections), with the following download manager features: start time, elapsed time, estimated time of arrival, completion time, downloaded size, full size, current download speed, minimum download speed, average download speed, maximum download speed, pause, resume and stop, and a progress bar (I am working on a PyQt6 project and I want to embed said progress bar in the GUI).

Currently I can only download files synchronously and with only one connection:

import requests
from pathlib import Path
data = requests.get(url).content
Path(filepath).write_bytes(data)

This method is slow, but the start time and end time can be easily determined in this way:

import datetime
import requests
from pathlib import Path
start = datetime.datetime.now()
data = requests.get(url).content
Path(filepath).write_bytes(data)
end = datetime.datetime.now()
total_elapsed_time = end - start

But, how can I get elapsed time while the download is running?

I have some (very) vague idea that I can use threading.Thread to run a while loop, that executes something once, then time.sleep(1), until the download has been completed, but I don't know exactly how to determine the completion status of a command.

Then, this can download different files concurrently, each with only one connection, but uncompleted downloads don't prevent new files from being downloaded:

import threading
def download(url, filepath):
    data = requests.get(url).content
    Path(filepath).write_bytes(data)

th = threading.Thread(target=download, args=(url, filepath))
th.start()

How can I determine the completion status of the threads and use said status as condition for a while loop?

I know I can get total download size using this:

requests.head(url).headers.get('content-length')

But I really don't know how to get the already downloaded size of a file being downloaded, I know I can get the size of already downloaded file using this:

Path(filepath).stat().st_size

But I don't know if this will work if a file is being downloaded, because I guess the file being downloaded would be read/write locked?

The only thing I found using Google that is relevant is this: https://stackoverflow.com/a/15645088/16383578, but that answer is for Python 2 while I am using Python 3, and I need to use PyQt6's progress bar, and I don't think it will get along with threading well.

So far I have some vague concept about what I describe as nested threading and parallel threading, in this concept, each download is a job in the main thread, each download job is composed of two threads: worker and monitor.

Worker is composed of 32 threads, each downloads a separate portion of the whole and don't interfere each other.

For each worker there is a corresponding monitor, that is parallel to worker and starts and ends at the same time the worker starts and ends.

Monitor is composed of a while loop, and it has a list to keep track of download speeds, it does the following operations then sleeps for a second: get the difference between current time and start time then set elapsed time to that value (update QTimeEdit), get the downloaded file size, and get the difference between the current downloaded size and the value it last got, the difference is the current download speed, add the value to download speed list, then use min, max, sum + len to get min down speed, max down speed and average down speed respectively, then update the four corresponding widgets (current, min, max, mean) with these values.

Use the current download size to update corresponding widget, and use downloaded / total to update progress bar, then use (total - downloaded) / mean_speed + current_time to get ETA.

All the about operations are done asynchronously (of course a variable necessary for an operation will be defined before it).

And the worker downloads the same file in 32 threads, it is as follows: each thread downloads different parts of the same file, and uses a different connection. Each thread is identified with an integer between 0 and 32 (include 0, exclude 32 or range(32)), and they each download parts like this:

chunk = total / 32
for i in range(32):
    download(chunk * i, chunk * (i + 1))

download is a function that takes two parameters, the parameters are integers specifying the starting byte and ending byte of the target file the thread should download, they are run asynchronously.

And then, how to pause and resume downloads?

How can the above concepts be realized?

Download files asynchronously with multiple connections using Python

0 Answers0