2

This question has been obsessing me for a few weeks but I was too much in rush to tackle it with sensible hindsight. I finally delivered a quick and dirty solution to the agog customer to relieve his fragile nerves (& also mine, as a matter of consequence). I am now finally taking time to examine this with a free mind.

Basically, I am facing the challenge of processing Synchronous Data (Time-Ordered Series) as fast as possible (<5ms). After trying to play around with many Rube-Goldberg-style designs, such as ordered thread pools and hybrid solutions involving parallel worker queues and other wonky far-fetched ideas, a thorough hands-on bench-work proved that sticking to a plain' old single threaded chain process was by far the best choice for data integrity and performance.

However, at some point in the bottom of the app, this gets to a certain limit. I need to broadcast data in a parallel fashion to different processors. And this is where my headache starts again. When I make the Data Hub (see below) send data to processors in an Asynchronous way (via a thread & a binary ring buffer), it is received in a scrambled order by the receiving party, and data order is corrupted.

So I am looking for a way to send data in a a parallel fashion to all processors, and keep order straight. My other concern is asynchronicity : If the Data Hub's SendEvent delegate is subscribed to by processes' Receive Method in a plain classical way (via +=) , how is this going to behave ?
First off, I don't want the subscriptions to be called "one by one". I am sure there is a way to parallelize this. Secondly, and above all, I do certainly not want processor A to lock the whole chain until it finishes its work.

SO here it is folks. To make a long story short I want to find a sensible way to keep sending data to processors without waiting for it to be processed, but in the same time I need each Data processor to receive the data in an ordered fashion. This might sound incompatible, but I am sure there is a smart & pretty simple way to do this (But I went so deep into this I got really confused, that's why I am asking for your help, good people)

enter image description here

Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
  • I liked your "parallel worker queues" idea. Specifically, [blocking queues](http://stackoverflow.com/a/530228), which allow data to be queued while it awaits processing. – Robert Harvey Jul 16 '12 at 15:33
  • Are processors located on a single (multicore) system or are they distributed machines? Which order is required pre-processor or post-processor or both? What is you .NET version? – oleksii Jul 16 '12 at 15:37
  • What constitutes "order" in your question? – Kit Jul 16 '12 at 16:23
  • @Kit : As stated, it is Time-Ordered Series, (datetime with milliseconds) – Mehdi LAMRANI Jul 16 '12 at 17:41
  • @RobertHarvey Yes indeed, but the again we introduce considerable latency which happens to be unacceptable in that case. Queue should be very small, but we often end up having big buffer due to the real-time data stream – Mehdi LAMRANI Jul 16 '12 at 17:43
  • @oleksii No way, we have 24 COres @ our disposal. Both orders are required, and we use .Net 4 – Mehdi LAMRANI Jul 16 '12 at 17:45
  • @oleksii : 24 Cores on the same machine (far better for latency) – Mehdi LAMRANI Jul 16 '12 at 17:53
  • @MikaJacobi ok got you :) Well sometimes people go for humble servers, but large number of them (horizontal scalability). Whereas some do vertical scalability by increasing processing power. I will try to mention a few things that can be relevant. See [this video](http://vimeo.com/3584536) on [Hadoop](https://hadoop.apache.org/). Also check out C# [task continuation](http://msdn.microsoft.com/en-us/library/ee372288.aspx) and ordered execution [discussion](http://stackoverflow.com/a/3639811/706456) – oleksii Jul 16 '12 at 18:00
  • @MikaJacobi - Helps if I read. I guess what I meant is what is the scope of the order? Is it that *T1* has to be delivered to some processor *P* before *T2* to some other processor? Or that *T1* has to be sent before *T2* (but not necessarily handled by *P2*), or what? They're all very different things. – Kit Jul 16 '12 at 21:02

3 Answers3

1

If you are talking about a single machine in which you need to distribute the ordered series to all processors on the same machine, I would have Data Hub, continuing with your single-thread idea, enqueue serially to N private queues (relatively fast, so I wouldn't worry about blocking, and you get the benefit of knowing the item got enqueued to all the queues).

Each of these private queues would only allow dequeue from one and only processor. The likely slower "Data Hub --> Processors" will be called in parallel, yet not hold up the whole chain as you put it because...

You can then configure a 1-processor/queue/thread (simple to manage) or 1 queue/N-queues/similar stateless processors (this would take additional downstream work to do reordering).

In a more distributed system (e.g. multi-machine) you typically have something like Data Hub sending a message to a messaging "Topic", and then let the messaging infrastructure send to all the consumers.

Kit
  • 20,354
  • 4
  • 60
  • 103
0

It sounds like you are looking for a Service Bus. In your diagram, that would be the DATA HUB. Microsoft provides a service called MSMQ that can be used for something like this, and I like to use a library called NServiceBus to simplify things.

Steve Czetty
  • 6,147
  • 9
  • 39
  • 48
  • 1
    I think OP will struggle to get <5ms response time with a message queue. – Davin Tryon Jul 16 '12 at 15:54
  • True.. Missed that part. The architecture could still apply, though. – Steve Czetty Jul 16 '12 at 16:03
  • @SteveCzetty : Don't take it badly, but I almost went into depression when I was in charge os the MSMQ support team in a big financial company a few years ago. Not really my best souvenir in life :-) This techno is a hammerfall to kill a fly IMHO. But my opinion is 100% biased indeed – Mehdi LAMRANI Jul 16 '12 at 17:47
  • @MikaJacobi LOL, I agree, that's why I use libraries like NServiceBus. (Which has other options than MSMQ) :) – Steve Czetty Jul 16 '12 at 18:00
0

Not sure if it is the best fit, but zeromq might work for you here. You can have queues that are all in-memory (fast), you can fan-out and then condense messages to multiple configurations (and I think it take care of order etc).

I have no experience using zeromq on a project, however, it does have some compelling features and seems well supported (with many client languages, C# included).

The manual page has a good introductory video showing what the tool can achieve.

Davin Tryon
  • 66,517
  • 15
  • 143
  • 132