0

I am trying to read Indian stock market data using API calls. For this example, I have used 10 stocks. My current program is:

First I define the Function:

def get_prices(stock):

    start_unix = 1669794745
    end_unix = start_unix + 1800
    interval = 1
    url = 'https://priceapi.moneycontrol.com/techCharts/indianMarket/stock/history?symbol=' + str(stock) + "&resolution="+ str(interval) + "&from=" + str(start_unix) + "&to=" + str(end_unix)
    url_data = requests.get(url).json()
    print(url_data['c'])

Next, I use multi-threading. I do not know much about the functioning of multithreading - I just used the code from a tutorial on the web.

from threading import Thread
stocks = ['ACC','ADANIENT','ADANIGREEN','ADANIPORTS','ADANITRANS','AMBUJACEM','ASIANPAINT','ATGL','BAJAJ-AUTO','BAJAJHLDNG']
threads = []
for i in stocks:
    threads.append(Thread(target=get_prices, args=(i,)))
    threads[-1].start()
for thread in threads:
    thread.join()

The time it takes is around 250 to 300 ms for the above program to run. In reality, I shall need to run the program for thousands of stocks. Is there any way to make it run faster. I am running the code in Jupyter Notebook on an apple M1 8 core chip. Any help will be greatly appreciated. Thank You!

Samit
  • 87
  • 6
  • 1
    a difficulty here is likely going to be response from the website. That is, you are sending a message to the website and you have to wait for their response. Nothing you can do to make them respond faster, and for your message to travel faster and theirs travel faster. You may be able to send them a message that asks for more than one stock at a time - so you just have 1 back-and-forth instead of thousands – scotscotmcc Nov 30 '22 at 19:10
  • @scotscotmcc Thank you very much for your response. Yes, the server response times are not in my control. But apart from that, is there anything that I can do - programatically? May be like using some more advanced multithreading/multiprocessing library or making some changes to the above code etc.? – Samit Nov 30 '22 at 19:18
  • Have you checked with this web site to see if they have a way to submit bulk requests, where you can ask for 20 or 50 results in one requests? THAT will be your best plan for speedup. – Tim Roberts Nov 30 '22 at 20:08

1 Answers1

2

When scraping data from the web, most of the type is typically spent on waiting for server responses. In order to issue a large amount of queries and to get responses as fast as possible, issuing multiple queries in parallel is the right approach. To be as efficient as possible, you have to find the right balance between a large amount of parallel requests and being throttled (or blacklisted) by the remote service.

In your code, you are creating as many threads as there are requests. In general, you would want to limit the number of threads and reuse the threads once they have performed a request in order to save resources. This is called a thread pool.

Since you are using Python, another lighter alternative to multiple threads is to run parallel "I/O" tasks using asyncio in Python. Sample implementations of parallel requests using either a thread pool or asyncio are shown in this Stack Overflow answer.

Edit: here is an adapted example from your code using asyncio:

import asyncio
from aiohttp import ClientSession

stocks = ['ACC','ADANIENT','ADANIGREEN','ADANIPORTS','ADANITRANS','AMBUJACEM','ASIANPAINT','ATGL','BAJAJ-AUTO','BAJAJHLDNG']

async def fetch_price(session, stock, start_unix, end_unix, interval):
    url = f'https://priceapi.moneycontrol.com/techCharts/indianMarket/stock/history?symbol={stock}&resolution={interval}&from={start_unix}&to={end_unix}'
    async with session.get(url) as resp:
        data = await resp.json()
        return stock, data['c']

async def main():
    start_unix = 1669794745
    end_unix = start_unix + 1800
    interval = 1
    async with ClientSession() as session:
        tasks = []
        for stock in stocks:
            tasks.append(loop.create_task(
                fetch_price(session, stock, start_unix, end_unix, interval)
            ))
        prices = await asyncio.gather(*tasks)
    print(prices)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
DurandA
  • 1,095
  • 1
  • 17
  • 35
  • Thank you very very much. It worked perfectly. I had to import ```asyncio_nest``` additionally along with the above program. The wall time has reduced to around 160-180 ms from 250-300 ms. This is great! Please allow me to wait for some more time to mark it as solved. I want to see if there are any other ways to make it faster. Thanks a lot again for putting in efforts to write that brilliant code. – Samit Nov 30 '22 at 20:48
  • Using `aiohttp` and for a large amount of requests, you can increase the limit of parallel request by [passing an explicit connector to `ClientSession`](https://stackoverflow.com/questions/53961359/understanding-aiohttp-tcpconnector-pooling-connection-limits). – DurandA Nov 30 '22 at 21:16