I am looking for advice how to best architecture a buffer structure that can handle a massive amount of incoming data that are processed at a slower speed than the incoming data.
I programmed a customized binary reader that can stream up to 12 million byte arrays per second on a single thread and look to process the byte array stream in a separate structure on the same machine and different thread. The problem is that the consuming structure cannot keep up with the amount of incoming data of the producer and thus I believe I need some sort of buffer to handle this properly. I am most interested in advice regarding the overall architecture rather than code examples. I target .Net 4.0. Here is more information of my current setup and requirements.
Producer: Runs on a dedicated thread and reads byte arrays from files on physical storage medium (SSD, OCZ Vertex 3 Max IOPS). Approximate throughput is 12 million byte arrays per second. Each array is only of 16 byte size. Fully implemented
Consumer: Supposed to run on a separate thread than the producer.Consumes byte arrays but must parse to several primitive data types before processing the data, thus the processing speed is significantly slower than the producer publishing speed. Consumer structure is fully implemented.
In between: Looking to set up a buffered structure that provides the producer to publish to and the consumer to, well, consume from. Not implemented.
I would be happy if some of you could comment from your own experience or expertise what best to consider in order to handle such structure. Should the buffer implement a throttling algorithm that only requests new data from the producer when the buffer/queue is half empty or so? How is locking and blocking handled? I am sorry I have very limited experience in this space and have so far handled it through the implementation of a messaging bus but any messaging bus technology I looked at is definitely unable to handle the throughput I am looking for. Any comments very welcome!!!
Edit: Forgot to mention, the data is only consumed by one single consumer. Also the order in which the arrays are published does matter; the order needs to be preserved such that the consumer consumes in the same order.