0

I am here basically accessing the api call with various values coming from the list list_of_string_ids

I am expecting to create 20 threads, tell them to do something, write the values to DB and then have them all returning zero and going again to take the next data etc.

I have problem getting this to work using threading. Below is a code which is working correctly as expected, however it is taking very long to finish execration (around 45 minutes or more). The website I am getting the data from allows Async I/O using rate of 20 requests.

I assume this can make my code 20x faster but not really sure how to implement it.

import requests
import json
import time
import threading
import queue

headers = {'Content-Type': 'application/json',
               'Authorization': 'Bearer TOKEN'}

start = time.perf_counter()
project_id_number = 123
project_id_string = 'pjiji4533'
name = "Assignment"
list_of_string_ids = [132,123,5345,123,213,213,...,n]      # Len of list is 20000


def construct_url_threaded(project_id_number, id_string): 
    url = f"https://api.test.com/{}/{}".format(project_id_number,id_string)
    r = requests.get(url , headers=headers)    # Max rate allowed is 20 requests at once.
    json_text = r.json()
    comments = json.dumps(json_text, indent=2)
    for item in json_text['data']:
        # DO STUFF

for string_id in all_string_ids_list:
    construct_url_threaded(project_id_number=project_id_number, id_string=string_id)

My trial is below

def main():
    q = queue.Queue()
    threads = [threading.Thread(target=create_url_threaded, args=(project_id_number,string_id, q)) for i in range(5) ]  #5 is for testing
    for th in threads:
        th.daemon = True
        th.start()

    result1 = q.get()
    result2 = q.get()
  • Welcome to SO! The placeholder info makes it hard for me to tell what you're trying to accomplish. You probably can't show the actual API if it uses a key, but you can use a mock API like https://jsonplaceholder.typicode.com/ or https://httpbin.org/ to create a [mcve]. If it were me, I'd look into using a [thread pool](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) and a request session, but it's hard to offer more specifics than that. Thanks. – ggorlen Feb 13 '21 at 16:38
  • @ggorlen Hi, I edited my post a bit, tried to show more details. Does this help? Thank you in advance – William Holmes Feb 13 '21 at 16:52
  • I can't run the code or determine what output you want, but thanks anyway. [Here's](https://stackoverflow.com/questions/46264056/python-download-multiple-files-from-links-on-pages/65996653#65996653) a canonical session/threadpoolexecutor example that might help. – ggorlen Feb 13 '21 at 16:56

0 Answers0