0

I'm using requests and threading in python to do some stuff. My question is: Is this code running truly multithreaded and is it safe to use? I'm experiencing some slow down over time. Note: I'm not using this exact code but mine is doing similar things.

import time
import requests

current_threads = 0
max_threads = 32


def doStuff():
    global current_threads
    r = requests.get('https://google.de')
    current_threads-=1


while True:
    while current_threads >= max_threads:
        time.sleep(0.05)

    thread = threading.Thread(target = doStuff)
    thread.start()

    current_threads+=1
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Yes, it's running multiple threads. It would be better for you to use a [`concurrent.futures.ThreadPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) with `max_workers=max_threads`. The code you have keeps creating new threads, which might be the cause of the eventual slow-down. As for safety, changing a shared variable like that without something like a `Lock` to prevent concurrent access could also be problematic—and unnecessary if you used a `ThreadPoolExecutor`. – martineau May 31 '19 at 21:39
  • @martineau Sir may I draw your attention [here](https://stackoverflow.com/questions/56344611/how-can-take-advantage-of-multiprocessing-and-multithreading-in-deep-learning-us) with **bounty** for similar issue ! – Mario May 31 '19 at 21:58
  • @martineau Thanks for the answer but why exactly is a ThreadPoolExecutor better? –  May 31 '19 at 22:37
  • Brian: Well, offhand, besides not requiring the modification global variables in a potentially unsafe way, it doesn't create an unlimited number of threads, it is a debugged built-in written by experts, and was designed precisely to do the kind thing it appears you're attempting. – martineau May 31 '19 at 23:01
  • Brian: This [answer](https://stackoverflow.com/a/14991752/355230) to another question also has a good explanation about how it's better. – martineau Jun 03 '19 at 09:39

2 Answers2

0

There could be a number of reasons for the issue you are facing. I'm not an expert in Python but I can see the potential for a number of causes for the slow down. Potential reasons I can think of are as follows:

  1. Depending on the size of the data you are pulling down you could potentially be overloading your bandwidth. Hard one to prove without seeing the exact code you are are using and what your code is doing and knowing your bandwidth.

  2. Kinda connected to the fist one but if your files are taking some time to come down per thread it maybe getting clogged up at the:

    while current_threads >= max_threads:
        time.sleep(0.05)
    

    you could try reducing the max number of threads and see if that helps though it may not if it's the files that are taking time to download.

  3. The problem may not be with your code or your bandwidth but with the server you are pulling the files from, if that server is over loaded it maybe slowing down your transfers.

  4. Firewalls, IPS, IDS, Policys on the server maybe throttling your requests. If you make too many requests to quickly all from the same IP the server side network equipment may mistake this as some sort of DoS attack and throttle your requests in response.

  5. Unfortunately Python, as compared to other lower level languages such as C# or C++ is not as good at multithreading. This is due to something called the GIL (Global Interpreter Lock) which comes into play when you are accessing/manipulating the same data in multiple threads. This is quite a sizeable subject in it's self but if you want to read up on it have a look at this link.

https://medium.com/practo-engineering/threading-vs-multiprocessing-in-python-7b57f224eadb

Sorry I can't be of any more assistance but this is as much as I can say on the subject given the provided information.

Dave748
  • 30
  • 6
  • Sir may I draw your attention [here](https://stackoverflow.com/questions/56344611/how-can-take-advantage-of-multiprocessing-and-multithreading-in-deep-learning-us) with **bounty** for similar issue ! – Mario May 31 '19 at 21:56
0

Sure, you're running multiple threads and provided they're not accessing/mutating the same resources you're probably "safe".

Whenever I'm accessing external resources (ie, using requests), I always recommend asyncio over vanilla threading, as it allows custom context switching (everywhere you have an "await" you switch contexts, whereas in vanilla threading switching between threads is determined by the OS and might not be optimal) and reduced overhead (you're only using ONE thread).

  • Sir may I draw your attention [here](https://stackoverflow.com/questions/56344611/how-can-take-advantage-of-multiprocessing-and-multithreading-in-deep-learning-us) with **bounty** for similar issue ! – Mario May 31 '19 at 21:57