0

What is the best and fastest pythonic way to program multithreading for a put request that is within a for loop? Now, as it is synchronous, it takes too long time to run the code. Therefore, we would like to include multithreading, to improve time.

Synchronous:

def econ_post_customers(self, file, data):
    try:
        for i in range(0, len(file['collection'])):
            rp = requests.put(url=self.url, headers=self.headers, params=self.params, data=data)
    except StopIteration:
        pass

We attempted to make threading, but starting threads on iterations just seems unnecessary, and we have 1000's of iterations, and we might run up on much more, so that would become a big mess with threads. Maybe including pools would solve the problem, but this is where i am stuck.

Anyone who has an idea on how to solve this?

Parallel:

def econ_post_customers(self, file, data):
    try:
        for i in range(0, len(file['collection'])):
            threading.Thread(target=lambda: request_put(url, self.headers, self.params, data)).start()
    except StopIteration:
        pass

def request_put(url, headers, params, single):
    return requests.put(url=url, headers=headers, params=params, data=single)

Any help is highly appreciated. Thank you for your time!

Buster3650
  • 470
  • 2
  • 8
  • 19
  • 1
    What are you trying to achieve? The highest throughput of PUT requests? If so, then async is the way forward, not multithreading. Check out [asyncio](https://docs.python.org/3/library/asyncio.html) and [aiohttp](https://docs.aiohttp.org/en/stable/), for example – Pynchia Jul 06 '21 at 09:37
  • i have paths looking like www.xxx.xxx/movie/{number}. I have to iterate through numbers like: www.xxx.xxx/movie/{1} and the do a put request for that number with one iteration, then www.xxx.xxx/movie/{2} do a put request for that number with next iteration, then www.xxx.xxx/movie/{3} do a put request for that number with next iteration... Now, if there are 1000 numbers that i have to iterate through, then i would have to wait a long time, as each iteration takes a little while due to the put request. But if i run them at the same time, it would not take that long time. – Buster3650 Jul 06 '21 at 09:41
  • As they are accessing same url, they don't end up in race conditions, and therefore, it doesn't matter which threads finish first or last – Buster3650 Jul 06 '21 at 09:42
  • Maybe [ThreadPool](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.ThreadPool) with `submit`/`starmap_async` is a simpliest solution. See: https://stackoverflow.com/questions/3033952/threading-pool-similar-to-the-multiprocessing-pool – Stanislav Ivanov Jul 06 '21 at 09:55

2 Answers2

1

Do try grequests module which works with gevent(requests is not designed for async).

If you see this you will get great results. (If this is not working pls do say).

Hemesh
  • 107
  • 9
1

If you want to use multithreading, then the following should work. However, I am a bit confused about a few things. You seem to be doing PUT requests in a loop but all with the same exact arguments. And I don't quite see how you can get a StopIteration exception in the code you posted. Also using a lambda expression as your target argument rather than just specifying the function name and then passing the arguments as a separate tuple or list (as is done below) is a bit unusual. Assuming that loop variable i in reality is being used to index one value that actually varies in the call to request_put, then function map could be a better choice than apply_async. It probably does not matter significantly for multithreading, but could make a performance difference for multiprocessing if you had a very large list of elements on which you were looping.

from multiprocessing.pool import ThreadPool

def econ_post_customers(self, file, data):
    MAX_THREADS = 100 # some suitable value
    n_tasks = len(file['collection'])
    pool_size = min(MAX_THREADS, n_tasks)
    pool = ThreadPool(pool_size)
    for i in range(n_tasks):
        pool.apply_async(request_put, args=(url, self.headers, self.params, data))
    # wait for all tasks to complete:
    pool.close()
    pool.join()

def request_put(url, headers, params, single):
    return requests.put(url=url, headers=headers, params=params, data=single)
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Hi Booboo, i understand the confusion, and that is my fault for not explaining my program and idea fully. You are right with your point that running an exact identical put request several times is meaningless, also that is in fact not what i am doing in my program. I iterate URL and data as ewll in for loop, but for simplicity when asking the question i edited the code a little, to not confuse you too much... although i see that it might have confused even more. That was not intentionel.. Anyway data and url changes as well, so json objects are put into their right rest url. – Buster3650 Jul 06 '21 at 16:14
  • As url changes, and can go out of bound, i need to include StopIteration. Otherwise it will continue until it crashes with StopIteration exception. – Buster3650 Jul 06 '21 at 16:16
  • See [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) so that you can edit your question and I might then give a more helpful answer, unless what I have already provided gives you enough information for you to take it from here. – Booboo Jul 06 '21 at 16:42
  • Fortunately, your solution works for me. Synchronous time: 23 minutes, while asynchronous time: 38 seconds.. i will definitely read that post for future posts, and thank you very much for your help and time. It is appreciated :-) – Buster3650 Jul 06 '21 at 17:08