The memory usage keep growing download file using multi-thread Queue, the requests data seem to be stored in memory and not released, looks so weird. Any one know why?
Here is my source code:
import threading
from queue import Queue
import requests
import shutil
class MultiThreadsModule():
def __init__(self, thread_num):
self.print_lock = threading.Lock()
self.compress_queue = Queue()
self.thread_num = thread_num
self.all_nodes = None
self.func_main = None
def thread_loop(self):
self.thread_list = []
for _ in range(self.thread_num):
t = threading.Thread(target=self.process_queue)
t.daemon = True
t.start()
self.thread_list.append(t)
def node_queue(self, nodes):
for node in nodes:
self.compress_queue.put(node)
self.compress_queue.join()
def process_queue(self):
while True:
node = self.compress_queue.get()
self.func_main(node)
self.compress_queue.task_done()
def run(self):
self.node_queue(self.all_nodes)
def download_file_(url):
r = requests.get(url, stream=True, timeout=600)
return r.text
if __name__ == '__main__':
mtm = MultiThreadsModule(20)
mtm.all_nodes = ["https://www.sec.gov/Archives/edgar/data/913951/000095013399003276/0000950133-99-003276-d2.pdf"] * 1000
mtm.func_main = download_file_
mtm.thread_loop()
mtm.run()
My memory keep growing with the download pdf size. When i shout down the download script, memory return normal as it start is.
Here is my memory change history:
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7501440 6342032 207336 2187428 8000996
Swap: 2097148 66828 2030320
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7499512 6344080 207124 2187308 8003148
Swap: 2097148 66828 2030320
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7484624 6304240 202692 2242036 8022128
Swap: 2097148 66828 2030320
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7482960 6305724 202692 2242216 8023788
Swap: 2097148 66828 2030320
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7559828 6210116 216200 2260956 7933424
Swap: 2097148 66828 2030320
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7559700 6204536 217868 2266664 7931840
Swap: 2097148 66828 2030320
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7637356 6127720 212544 2265824 7859580
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 7871248 5816500 262944 2343152 7575232
Swap: 2097148 66828 2030320
(base) jay@ubuntu:~$ free
total used free shared buff/cache available
Mem: 16030900 8412848 5193552 252832 2424500 7042864
Swap: 2097148 66828 2030320
The most weird stuff is if change another website file link, everything is normal, my memory never growing as above issue. the other link: sse.com.cn/disclosure/listedinfo/announcement/c/2019-12-27/… .
(base) jay@ubuntu:~$ curl -I https://www.sec.gov/Archives/edgar/data/913951/000095013399003276/0000950133-99-003276-d2.pdf
HTTP/1.1 200 OK
Date: Thursday, 16-Jan-20 12:07:34 CST
Keep-Alive: timeout=58
Content-Length: 0
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Type: application/pdf
ETag: "9a2245050c11f5db74ed734eb31b31a0"
Last-Modified: Mon, 02 Oct 2017 21:50:29 GMT
Server: AmazonS3
x-amz-id-2: BwWoaQxPxWSoKT3cJz2fpFLf9j53sdO20m4IedR9I5ZJNBHIFyH4AuqiN9HRx45sSdw/NmhkAjs=
x-amz-meta-mode: 33188
x-amz-replication-status: REPLICA
x-amz-request-id: 33A22AD1A6F87DE0
x-amz-version-id: lVtscFRHVvquEIo8.Q7sUnmAO1nQkKm7
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Content-Length: 1741924
Date: Thu, 16 Jan 2020 04:07:35 GMT
Connection: keep-alive
Strict-Transport-Security: max-age=31536000 ; includeSubDomains ; preload
(base) jay@ubuntu:~$ curl -I http://www.sse.com.cn/disclosure/listedinfo/announcement/c/2019-12-27/603818_2018_nA.pdf
HTTP/1.1 200 OK
Content-Length: 3899122
Accept-Ranges: bytes
Age: 588
Content-Type: application/pdf
Date: Thu, 16 Jan 2020 04:08:09 GMT
Etag: "WAa9c77cbc49bb90b3"
Keep-Alive: timeout=58
Last-Modified: Thu, 26 Dec 2019 09:35:22 GMT
Server: Apache
X-Wa-Info: [V2.S11101.A12708.P79382.N26848.RN0.U4201449325].[OT/pdf.OG/documents]
The memory will be consumed up util system can't allocate memory resource