Processing instrument capture data

Question

I have an instrument that produces a stream of data; my code accesses this data though a callback onDataAcquisitionEvent(const InstrumentOutput &data). The data processing algorithm is potentially much slower than the rate of data arrival, so I cannot hope to process every single piece of data (and I don't have to), but would like to process as many as possible. Thank of the instrument as an environmental sensor with the rate of data acquisition that I don't control. InstrumentOutput could for example be a class that contains three simultaneous pressure measurements in different locations.

I also need to keep some short history of data. Assume for example that I can reasonably hope to process a sample of data every 200ms or so. Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data that arrived prior to that latest sample, depending on whether abnormal readings are present in the last sample.

The other requirement is to get out of the onDataAcquisitionEvent() callback as soon as possible, to avoid data loss in the sensor.

Data acquisition library (third party) collects the instrument data on a separate thread.

I thought of the following design; have single producer/single consumer queue and push the data tokens into the synchronized queue in the onDataAcquisitionEvent() callback.
On the receiving end, there is a loop that pops the data from the queue. The loop will almost never sleep because of the high rate of data arrival. On each iteration, the following happens:

Pop all the available data from the queue,
The popped data is copied into a circular buffer (I used boost circular buffer), this way some history is always available,
Process the last element in the buffer (and potentially look at the prior ones),
Repeat the loop.

Questions:

Is this design sound, and what are the pitfalls? and
What could be a better design?

Edit: One problem I thought of is when the size of the circular buffer is not large enough to hold the needed history; currently I simply reallocate the circular buffer, doubling its size. I hope I would only need to do that once or twice.

Please, be more specific: when you are writing about "data", do you mean actual PCM samples or some kind of events like e.g. MIDI events? — Frunsi, Jul 22 '12 at 03:49
Thanks, the "instrument" is not related to music, it is an environmental sensor. I'll update the original post. — Cattus, Jul 22 '12 at 04:04

score 3 · Answer 1 · answered Jul 22 '12 at 03:18

I have a bit of experience with data acquisition, and I can tell you a lot of developers have problems with premature feature creep. Because it sounds easy to simply capture data from the instrument into a log, folks tend to add unessential components to the system before verifying that logging is actually robust. This is a big mistake.

The other requirement is to get out of the onDataAcquisitionEvent() callback as soon as possible, to avoid data loss in the sensor.

That's the only requirement until that part of the product is working 110% under all field conditions.

Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data that arrived prior to that latest sample, depending on whether abnormal readings are present in the last sample.

"Most of the time" doesn't matter. Code for the worst case, because onDataAcquisitionEvent() can't be spending its time thinking about contingencies.

It sounds like you're falling into the pitfall of designing it to work with the best data that might be available, and leaving open what might happen if it's not available or if providing the best data to the monitor is ultimately too expensive.

Decimate the data at the source. Specify how many samples will be needed for the abnormal case processing, and attempt to provide that many, at a constant sample rate, plus a margin of maybe 20%.

There should certainly be no loops that never sleep. A circular buffer is fine, but just populate it with whatever minimum you need, and analyze it only as frequently as necessary.

The quality of the system is determined by its stability and determinism, not trying to go an extra mile and provide as much as possible.

Thank you. Good point about not providing too much data; it is very possible that the acquisition rate becomes so high that I would be spending most of the time putting already irrelevant data into the buffer. — Cattus, Jul 22 '12 at 03:41

score 0 · Answer 2 · answered Jul 22 '12 at 04:18

Your producer/consumer design is exactly the right design. In real-time systems we often also give different run-time priorities to the consuming threads, not sure this applies in your case.

Use a data structure that's basically a doubly-linked-list, so that if it grows you don't need to re-allocate everything, and you also have O(1) access to the samples you need.

If your memory isn't large enough to hold your several seconds worth of data (which it should -- one sample every 200ms? 5 samples per second.) then you need to see whether you can stand reading from auxiliary memory, but that's throughput and in your case has nothing to do with your design and requirement for "Getting out of the callback as soon as possible".

Consider an implementation of the queue that does not need locking (remember: single reader and single writer only!), so that your callback doesn't stall.

If your callback is really quick, consider disabling interrupts/giving it a high priority. May not be necessary if it can never block and has the right priority set.

Thank you. A small correction -- the processing algorithm needs at most 200ms to run on a system that is not overloaded, but data arrival rate is much higher (which means I would have to skip samples). I don't have to process all the samples, but need to have history just in case. — Cattus, Jul 22 '12 at 04:24
No problem. My answer still stands, though. Seems like you can accomodate all the samples you need in memory, and the right design is still add to the end of a linked list, and have another thread handle the samples. (In that thread you can do whatever you eant, including spawning more threads etc. ) — Nitzan Shaked, Jul 22 '12 at 05:16

score 0 · Answer 3 · answered Jul 22 '12 at 05:25

Questions, (1) is this design sound, and what are the pitfalls, and (2) what could be a better design. Thanks.

Yes, it is sound. But for performance reasons, you should design the code so that it processes an array of input samples at each processing stage, instead of just a single sample each. This results in much more optimal code for current state of the art CPUs.

The length of such a an array (=a chunk of data) is either fixed (simpler code) or variable (flexible, but some processing may become more complicated).

As a second design choice, you probably should ignore the history at this architectural level, and relegate that feature...

Most of the time I would be happy processing just a single last sample, but occasionally I would need to look at a couple of seconds worth of data [...]

Maybe, tracking a history should be implemented in just that special part of the code, that occasionally requires access to it. Maybe, that should not be part of the "overall architecture". If so, it simplifies processing at all.

Processing instrument capture data

3 Answers3