0

I am writing a python script to make a 6000 api calls and it takes about 20 minutes to complete in synchronous format. I am trying to see if I can decrease the process time to less than a minute.

Below are the steps I am taking

  1. Read a csv file which has SKU(unique product identifier)
  2. Inside a function I use for loop to construct a url with SKU at the end
  3. Take a response object and passed to another function (response object has product name and price)

Like I said before in synchronous format it takes about 20 minutes or more and I tried using Threads, the process time went down to 5 minutes but the issue with threads is function does not return anything. So to overcome threads issue I used Queue Module but then process time increased.

I am new to python so not sure what I can do to reduce the process time, I looked in asyncio but its very confusing and I am not even sure where to start with asyncio.


So I added below code, not sure if its the best way to handle this. This returns response in 127 seconds for 2750 api calls.

I added 10 functions and I am using 10 threads to call them.

with concurrent.futures.ThreadPoolExecutor() as executor:
first = executor.submit(get_stock_info_first)
second = executor.submit(get_stock_info_second)
third = executor.submit(get_stock_info_third)
fourth = executor.submit(get_stock_info_fourth)
fifth = executor.submit(get_stock_info_fifth)
sixth = executor.submit(get_stock_info_sixth)
seventh = executor.submit(get_stock_info_seventh)
eight = executor.submit(get_stock_info_eight)
t9 = executor.submit(get_stock_info_t9)
t10 = executor.submit(get_stock_info_t10)

return_value1 = first.result()
return_value2 = second.result()
return_value3 = third.result()
return_value4 = fourth.result()
return_value5 = fifth.result()
return_value6 = sixth.result()
return_value7 = seventh.result()
return_value8 = eight.result()
return_value9 = t9.result()
return_value10 = t10.result()

I am trying to see how can I improve this code. Please let me know if there is a better way to do this.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
  • 1
    You definitely can get the return value of a thread. https://stackoverflow.com/questions/6893968/how-to-get-the-return-value-from-a-thread-in-python – Arya McCarthy Apr 04 '20 at 01:59
  • Ideally, there would be a remote endpoint that works in batches.. so few designs consider high-throughput cases. Regardless, assuming the bottleneck is with the [web] request itself, threads (or other concurrency) will definitely improve the performance for the part that can be run in parallel. However, no code minimal code given ~ unclear as to what _the_ issue might be. I suspect it might be not using the constructs appropriately. Also, too _much_ concurrency (or concurrency with too much overhead) can be worse than no concurrency. – user2864740 Apr 04 '20 at 02:31

1 Answers1

0

You can perhaps try to use a profiler to find out where in the code your program is wasting the most time/cpu cycles and you can also profile for memory if that's a concern.

The python docs have some recommendations for profilers (https://docs.python.org/2/library/profile.html) and I'm aware that there are also more visual ones out there with nice GUI's.