3

We are working on a Algorithmic trading software in C#. We monitor Market Price and then based on certain conditions, we want to buy the stock.

User input can be taken from GUI (WPF) and send to back-end for monitoring.

Back - end receives data continuously from Stock Exchange and checks if user entered price is met with certain limits and conditions. If all are satisfied, then we will buy / sell the stock (in Futures FUT).

Now, I want to design my Back end service.

  1. I need Task Parallel Library or Custom Thread Pool where I want to create my tasks / threads / pool when application starts (may be incremental or fixed say 5000).
  2. All will be in waiting state.
  3. Once user creates an algorithm, we will activate one thread from the pool and monitors price for each incoming string. If it matches, then buy / sell and then go into waiting state again. (I don't want to create and destroy the threads / tasks as it is time consuming).

So please can you guys help me in this regard? If the above approach is good or do we have any other approach?

I am struck with this idea and not able to go out of box to think on this.

VMAtm
  • 27,943
  • 17
  • 79
  • 125
SNR
  • 47
  • 1
  • 9

4 Answers4

4

The above approach is definitely not "good"

Given the idea above, the architecture is wrong in many cardinal aspects. If your Project aspires to survive in 2017+ markets, try to learn from mistakes already taken in 2007-2016 years.


NBBO flutter The percentages demonstrate the NBBO flutter for all U.S. Stocks from 2007-01 ~ 2012-01. ( Lower values means better NBBO stability. Higher values: Instability ) ( courtesy NANEX )

Financial Markets operate on nanosecond scales

Yes, a few inches of glass-fibre signal propagation transport delay decide on PROFIT or LOSS.

If planning to trading in Stock Markets, your system will observe the HFT crowd, doing dirty practice of Quote Stuffing and Vacuum-Cleaning 'em right in front of your nose at such scales, that your single-machine multi-threaded execution will just move through thin-air of fall in gap already created many microseconds before your decision took place on your localhost CPU.

enter image description here
The rise of HFT from 2007-01 ~ 2012-01 ( courtesy NANEX ).

May read more about an illusion of liquidity here.

See the expansion of Quotes against the level of Trades: enter image description here ( courtesy NANEX )

Even if one decides to trade in a single instrument, on FX, the times are prohibitively short ( more than 20% of the ToB Bids are changed in time less than 2 ms and do not arrive to your localhost before your trading algorithm may react accordingly ).

If your TAMARA-measurements are similar to this, at your localhost, simply forget to trade in any HF/MF/LF-HFT instruments -- you simply do not see the real market ( the tip of the iceberg ) -- as the +20% price-events happen in the very first column ( 1 .. 2 ms ), where you do not see any single event at all!

enter image description here

Community
  • 1
  • 1
user3666197
  • 1
  • 6
  • 50
  • 92
  • great knowledge and surely our goal is to reach nano seconds in 2017+ markets. Thanks a ton for your comments and answer. This shows me where I am and how steep mu learning curve is. Honestly I have to learn a lot to understand this. – SNR Nov 04 '16 at 19:38
  • If your Programme is indeed financed to get inside the 1ns race of the races, be sure to give me an offer to join it. **A cool ride!** – user3666197 Nov 04 '16 at 19:42
  • @SNR, if speaking about **nanoseconds-driven** design practices, one might alse enjoy an overview of CPU / GPU code + Cache + RAM latencies, to bear in mind, once hard-real-time systems are to be designed **>>>** http://stackoverflow.com/a/33065382/3666197 – user3666197 Nov 06 '16 at 08:25
  • thanks a ton for the information. As I said, I can't understand it straight away. :-). I will be more than happy to offer you, but I am not in a stage right now. But will definitely engage you down the line. – SNR Nov 06 '16 at 16:09
2

5000 threads is bad, don't do that ever, you'll degrade the performance with context switch loss much more than parallel execution timing improvement. Traditionally the number of threads for your application should be equal to the number of cores in your system, by default. There are other possible variants, but probably they aren't the best option for your.

So you can use a ThreadPool with some working item method there with infinite loop, which is very low level, but you have control on what is going on in your system. Callback function could update the UI so the user will be notified about the trading results.

However, if you are saying that you can use the TPL, I suggest to consider these two options for your case:

  1. Use a collection of tasks running forever for checking the new trading request. You still should tune up the number of simultaneously running tasks because you probably don't want them to fight each other for a CPU time. As the LongRunning tasks are created with dedicated background thread, many of them will degrade your application performance as well. Maybe in this approach you should introduce a strategy pattern implementation for a algorithm being run inside the task.

  2. Setup a TPL Dataflow process within your application. For such approach your should encapsulate the info about the algorithm inside a DTO-object, and introduce a pipeline:

    • BufferBlock for storing all the incoming requests. Maybe you can use here a BroadcastBlock, if you want to check the sell or buy options in parallel. You can link the block with a boolean predicate here so the different block will process different types of requests.
    • ActionBlock (maybe one block for each algorithm from user) for processing the algorithmic check for a pattern based on which you are providing the decision.
    • ActionBlock for storing all the buy / sell requests for a data successfully passed by the algorithm.
    • BufferBlock for UI reaction with a Reactive Extensions (Introductory book for Rx, if you aren't familiar with it)

    This solution still has to be tuned up with a block creation options, and more informative for you how exactly your data flow across the trading algorithm, the speed of the decision making and overall performance. You should properly examine for a defaults for TPL Dataflow blocks, you can find them into the official documentation. Other good place to start is Stephen Cleary's introductory blog posts (Part 1, Part 2, Part 3) and the chapter #4 about this library in his book.

VMAtm
  • 27,943
  • 17
  • 79
  • 125
  • With all due respect, excuse my system-design terminology purism. A thread-based execution is just a kind of a **`[concurrent]`-type** system scheduling, principally never a **`[parallel]`-type** system behaviour. InMOS Transputer hardware was providing hardware-based `[parallel]`-type system behaviour & the `occam` language could thus provide a **`PAR`** code-sections, that were indeed **guarranteed to be executed under a trully parallel-schedule**, yielding their **results at the guarranteed exactly the same time.** Current XEON hardware may allow max 2HT*nCpuCOREs concurrent INSTR execs. – user3666197 Nov 04 '16 at 18:57
  • As `TPL` stands for `Task Parallel Library`, I've used the word `parallel` for a tasks which will be run in `parallel` in terms of `TPL`. – VMAtm Nov 04 '16 at 19:01
  • 1
    Sure, NP :o) it is "so common" recently, infected from technology marketing, that the very technical purity got almost lost. Shame on us, if we forget our craftsmanship ( and appear just to rephrase the marketing blables too ;o) Enjoy the day, VMAtm. – user3666197 Nov 04 '16 at 19:06
  • @VMAtm: I am most concerned about the CPU cycle usage now as we need to run so many parallel tasks to look after connection, simultaneous storing data in Database and in memory dictionaries. Lot of stuff for me today to digest. I know basics of TPL but these links shall definitely routes to me in correct direction. – SNR Nov 04 '16 at 19:39
  • @user3666197, thanks a lot for the update on XEON hardware and concurrent / parallel types. We are looking for XEON server as we need low latency system. Honestly, I need to study on the terminology you used in your answer. :-) – SNR Nov 04 '16 at 19:42
  • @VMAtm, I know it is a basic doubt. But my algorithms ultimately use one common function to check whether the price is in range for buy or sell. Can we invoke this common function on all our parallel tasks? We can't use any locks as it makes them synchronous again. – SNR Nov 04 '16 at 19:44
  • @SNR If your function do not produce any side effects, why not? It will be invoked for different parameters, like `pure` function in functional programming – VMAtm Nov 04 '16 at 20:07
  • @VMAtm, I am trying to write a generic function. But I will have a dynamic set of tasks on left side and dynamic object from exchange on right side. My single function can't process this. So I am thinking on how to write this. – SNR Nov 06 '16 at 16:10
  • You may use some `ConcurrentDictionary` for exchange the results – VMAtm Nov 06 '16 at 20:57
  • 1
    @VMAtm, I am marking this as answer. Starting in this direction and hoping for best. Thank you. – SNR Nov 21 '16 at 07:27
1

With C# 5.0, the natural approach is to use async methods running on top of the default thread pool.

This way, you are creating Tasks quite often, but the most prominent cost of that is in GC. And unless you have very high performance requirements, that cost should be acceptable.

svick
  • 236,525
  • 50
  • 385
  • 514
  • thank you very much. Our application shall be low latency and high performance application. This is one of the reasons, I would like to create custom pool. – SNR Nov 04 '16 at 16:55
  • I think you should first do it the simplest way and only if it turns out to be insufficient, then look on how to improve it. Especially since the existing pool is likely going to be optimized more than your custom one. – svick Nov 04 '16 at 17:04
  • thanks again. I will try it. I will see if any one has any ideas on the custom pool. But thank you for the suggestion. – SNR Nov 04 '16 at 17:07
0

I think you would be better with an event loop, and if you need to scale, you can always shard by stock.

Pablo Alcubilla
  • 363
  • 2
  • 5
  • http://stackoverflow.com/questions/2842264/is-it-possible-in-net-using-c-to-achieve-event-based-asynchronous-pattern-wi I got this link for Event Loop. Is this for the Event loop you are saying? Please can you provide any link? Thank you. – SNR Nov 04 '16 at 05:54