0

I am building a trading algo in Python. I read data from a broker API, which lets me subscribe to receive market data for various securities.

I am trying to find patterns in the price of options. Ideally, I would subscribe to thousands of options' live data, and try to detect my patterns there.

As it seems to be an I/O bound situation, since I'm waiting on trade events that happen sporadically, I went with threads: I create one thread per security and make it wait for the pattern to arise. I also create other auxiliary threads, like one other thread per market monitoring thread, that waits for a stopping event to stop the market monitoring thread...

Now that I have a few threads per monitored security, I realize with just a few option chains I hit RuntimeError: can't start new thread. I used the answer to this question to compute that the limit of my setup was at 884 threads, which is way too low for the application I had in mind for my threads.

I'm quite surprised, I thought threads in Python were just an abstract concept to split CPU time. In that case, why can't I create an arbitrary number of threads at risk of having very little CPU time for each? If threads are not a clever choice for this application, is there a better one?

Mysterry
  • 233
  • 3
  • 10
  • Isn't that what async is meant for? – 9769953 Dec 05 '21 at 15:59
  • Do you want to continuously receive data from each live set of data/connection? do you need to process it instantaneously as well? And how much data is there per second for each connection? – 9769953 Dec 05 '21 at 16:00
  • @9769953 Indeed I want to continuously receive data for each live set of data. The processing should be as fast as possible, but if adding a few seconds of delay to it changes things, that would be a solution. Each connection provides a live feed of the current market price. I don't know the precise sampling rate I get from the API, but I am interested in monitoring trade events, so *changes* in the price. Depending on the option, such events appear every 1 to 30s – Mysterry Dec 05 '21 at 16:29
  • At every 1 to 30 seconds, I would think this makes a case for async, where every connection is polled every so often (by just one thread, going through connections). If it were (far) more often, plus all the processing, you'll just need as many threads (and probably CPU power) as connections, also for the processing. Keeping track and processing thousands of data connections will cost you compute power anyway, whether you use threads of not. – 9769953 Dec 05 '21 at 16:35
  • @9769953 thanks for the idea! Is there any go-to async package in Python, that would let me design a pool of tasks so that a thread goes through them? – Mysterry Dec 05 '21 at 17:36

0 Answers0