0

I have relatively fast streams of data coming into my program ( 4 streams of approx. 25Hz) I have to store every input and persist it, and upload it later on. The objects themselves are relatively simple, only made strings and doubles.

I'm thinking I'll use a database for storage. I've thought of files before, but I think a db is better. Either way, if you have a suggestion for this, feel free to share it, but its out of the scope of this question.

Now my problem lies here : how to achieve this task properly ?

The fast flow of objects will be stored into a collection (I don't know which yet), for each stream separately, and every once in a while (probably every 500 objects per stream or so), I'll save them in the database.

I'm afraid this will be prone to race conditions, since I'll be writing in the collection while removing objects from it.

Also, I don't think I need the collection to be ordered because the data is time sensitive ; there is a timestamp on each object of each of the data streams. So it does not really matter if I happen to persist data in the "wrong" order, as long as it's saved in the database, removed from the collection, and that the flow is not interrupted.

Basically, this could be a classic FIFO behaviour, but if it's easier not to then I should be fine anyway. Either way, I'm not sure how to achieve it in terms of logic. I've had my fair share of head scratching and I'd rather go prepared.

I don't specifically need copy-paste code, I'm looking for an actual answer with, if possible, an explanation.

  • What kind of collection do you recommend ?
  • Do I need some kind of asynchronous collection logic?
  • Do I need some kind of thread/lock logic ?
  • Is there an collection that can be modified on both ends at the same time?

I have no particular guideline in mind, I'm really open to suggestions.

EDIT : Also, it's worth mentioning i'm using C# if someone wants to link something from a documentation.

Thank you all very much for your time, as always, it is greatly appreciated :)

Gil Sand
  • 5,802
  • 5
  • 36
  • 78

1 Answers1

3

You are looking for a queue (FIFO). In particular ConcurrentQueue - it will handle locking for you.

Alternatively with such a low volume of data basic list with lock around reading and writing may be enough.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • It has no operations to asynchronously (or even synchronously) get an item from the collection. – Servy Feb 06 '17 at 21:51
  • Thanks for your answer. It's interesting you consider this a low volume of data, as I considered it the opposite ! – Gil Sand Feb 06 '17 at 21:54
  • @Zil - I have a single-threaded C# service that processes realtime market data updates and calculates profit and loss for firm positions at 20khz (typical) and 100khz (burst). It's all relative, but don't be intimidated by 25hz as that is easily do-able provided the size of each update is not excessive. – hoodaticus Feb 06 '17 at 21:57
  • @Servy Thanks, I don't see any requirement in the post that one need something beyond Enqueue/TryDequeue. I understood "asynchronous" requirement as "from any thread" - not really sure why one would have collection that is local and support true asynchronous operations. – Alexei Levenkov Feb 06 '17 at 21:58
  • each update is very small, we're talking 4 numbers of 6 decimals (per stream). According to the link provided, the concurrent queue seems to solve every problem I had. I'll go with that. Thank you all for your time – Gil Sand Feb 06 '17 at 21:59
  • 1
    @AlexeiLevenkov I said async because I have 4 streams, so 4 collections, but only one database. So at some point the add/remove call of collection 3 will have to wait for the database to be done writing the previous calls of collection 0 1 and 2. I'm *assuming* this is async/await behaviour, but again I might be wrong – Gil Sand Feb 06 '17 at 22:02
  • @AlexeiLevenkov Using your approach would require the code to have a separate thread and sit there in a busy loop trying to take items until it had enough. If the collection actually provided an asychronous operation to take an item, you could simply await that and use the item when one was available. The fact that you need a busy loop to actually get an item out makes `ConcurrentQueue` very rarely a useful container. – Servy Feb 06 '17 at 22:02
  • @Servy Is there anything you could suggest? In an answer maybe, considering you seem opposed to the current answer of Alexei – Gil Sand Feb 06 '17 at 22:03
  • @Zil 4 stream at 25hz is about 100 request-per-second. This is pretty good but not unreasonably high rate for ASP.Net web site to render dynamic pages. Your requirements seem to way simpler than that (unless you mistype 25KHz) – Alexei Levenkov Feb 06 '17 at 22:04
  • There are much higher performing ways of doing this than using ConcurrentQueue, however, ConcurrentQueue is the simplest for this workload. I actually cancelled my answer that was much faster because this is all the performance you need. For the DB I recommend you don't index it at all though since it will be your bottleneck. – hoodaticus Feb 06 '17 at 22:04
  • 1
    @Servy if instantaneous saves would be mentioned than indeed fancier solution (like http://stackoverflow.com/questions/531438/c-triggering-an-event-when-an-object-is-added-to-a-queue) would be necessary. In this case waiting for fixed amount of time if queue is empty would likely be all OP need to do avoid busy loop. – Alexei Levenkov Feb 06 '17 at 22:08
  • @Servy - a busy loop is not required here. At 25hz he can run a Threading.Timer. – hoodaticus Feb 06 '17 at 22:10
  • @AlexeiLevenkov That behavior *was* mentioned in the question. You choose to ignore it, and just assumed that the OP didn't know what they were talking about when they proposed sensible behavior of having an asynchronous collection, instead assuming that they meant something radically different than what they actually asked. Your behavior would not only result in the consumer waiting a while from the time an item is provided to handle it, but it also requires a thread to sit there in a loop, constantly being awoken and put back to sleep without doing any work. – Servy Feb 06 '17 at 22:10
  • 1
    @Servy you should provide better answer to the question as you read it. I still don't see requirement that incoming data must be saved to DB instantly (I see quite opposite - that delays and reordering are not critical "and upload it later on" and "does not really matter if I happen to persist data in the "wrong" order, as long as it's saved in the database") – Alexei Levenkov Feb 06 '17 at 22:16
  • 1
    @Servy side note: "but it also requires a thread to sit there in a loop" - not sure why it is a problem to start with in such case. Plus normally you'd use `Task.Delay` for such loop (or a timer) if blocking thread with Sleep is not acceptable. – Alexei Levenkov Feb 06 '17 at 22:20
  • I'll clarify here. I intended (but this is always open for better options) to get the stream objects into a collection, wait a small amount of time (or iterations) before saving it to a database. I'm waiting because I don't want to do excessive calls to the database and i'd rather do slower, bigger chunks rather than at 25Hz with 1 entry. The point is, when the data stream stops, the database has about 5 seconds to get all the (remaining/in memory) data. Besides that, best performance is obviously important, but everything else is not. – Gil Sand Feb 06 '17 at 22:22
  • @AlexeiLevenkov - even if you were using a WaitHandle it would still be looping over the signal. Since we know the data receive rate, and we know latency is unimportant, there is no reason to add that complexity to this answer. A timer works perfectly here. Even Thread.Sleep(1) would be fine. – hoodaticus Feb 06 '17 at 22:23