0

I have a thread running that reads a stream of bytes from a serial port. It does this continuously in the background, and reads from the stream come in at different, separate times. I store the data in a container like so:

using ByteVector = std::vector<std::uint8_t>;
ByteVector receive_queue;

When data comes in from the serial port, I append it to the end of the byte queue:

ByteVector read_bytes = serial_port->ReadBytes(100); // read 100 bytes; returns as a "ByteVector"
receive_queue.insert(receive_queue.end(), read_bytes.begin(), read_bytes.end());

When I am ready to read data in the receive queue, I remove it from the front:

unsigned read_bytes = 100;
// Read 100 bytes from the front of the vector by using indices or iterators, then:
receive_queue.erase(receive_queue.begin(), receive_queue.begin() + read_bytes);

This isn't the full code, but gives a good idea of how I'm utilizing the vector for this data streaming mechanism.

My main concern with this implementation is the removal from the front, which requires shifting each element removed (I'm not sure how optimized erase() is for vector, but in the worst case, each element removal results in a shift of the entire vector). On the flip side, vectors are candidates for CPU cache locality because of the contiguous nature of the data (but CPU cache usage is not guaranteed).

I've thought of maybe using boost::circular_buffer, but I'm not sure if it's the right tool for the job.

I have not yet coded an upper-limit for the growth of the receive queue, however I could easily do a reserve(MAX_RECEIVE_BYTES) somewhere, and make sure that size() is never greater than MAX_RECEIVE_BYTES as I continue to append to the back of it.

Is this approach generally OK? If not, what performance concerns are there? What container would be more appropriate here?

void.pointer
  • 24,859
  • 31
  • 132
  • 243
  • "but this is not guaranteed". I believe it is. – Barmar Jan 18 '17 at 17:19
  • 4
    Use std::deque. –  Jan 18 '17 at 17:19
  • 2
    @Barmar Not by the C++ standard. To be clear: I'm referring to the ability to store the array data in CPU cache, not talking about the requirement that vector's data be contiguous. – void.pointer Jan 18 '17 at 17:19
  • @void Yes, it is. –  Jan 18 '17 at 17:20
  • @latedeveloper Reference please. – void.pointer Jan 18 '17 at 17:20
  • @void.pointer http://stackoverflow.com/questions/849168/are-stdvector-elements-guaranteed-to-be-contiguous – Barmar Jan 18 '17 at 17:20
  • @void Not the standard, but http://en.cppreference.com/w/cpp/container/vector –  Jan 18 '17 at 17:21
  • @void.pointer I'm not sure what the difference between those are. If the data is contiguous in memory, then it can be cached together, so long as it fits in a cache line. – Barmar Jan 18 '17 at 17:21
  • 1
    @Barmar: Please go back and read my OP more carefully. I was not talking about vector contiguity. – void.pointer Jan 18 '17 at 17:21
  • We're veering off topic here, but my point was more general: Cache locality is not standardized because it depends on the hardware, size of the vector, etc. Many factors. In a truly portable implementation, I cannot use vector under those kinds of assumptions. So I want to focus on what makes the most sense, algorithmically. – void.pointer Jan 18 '17 at 17:23
  • 3
    @void.pointer As I said, I don't understand the distinction you're making between vector contiguity and cacheability. – Barmar Jan 18 '17 at 17:23
  • You are remembering to lock a mutex when you write to the queue and read from it aren't you? Thinks will end badly if you don't. (It is *possible* to write a lock-free single producer single consumer queue, but I don't think there's one in the STL). – Martin Bonner supports Monica Jan 18 '17 at 17:34
  • @MartinBonner Unfortunately the STL does not but boos does have a [lock free queue](http://www.boost.org/doc/libs/1_53_0/doc/html/boost/lockfree/queue.html). – NathanOliver Jan 18 '17 at 17:36
  • Worrying if something will fit in the CPU cache when you are dealing with serial interface speeds seems a little OTT. –  Jan 18 '17 at 17:37
  • I worded my "this is not guaranteed" statement in my OP. I see where the confusion is. I edited my post to clarify. What I was trying to say was that usage of CPU cache is not guaranteed. – void.pointer Jan 18 '17 at 22:09
  • @void.pointer what is the significance of the "non-guarantee"? Indeed, it is not guaranteed that a CPU must have a cache, but is that relevant to your question? – eerorika Jan 18 '17 at 22:16
  • My point was, the element shifting that occurs in the vector might be negligible if the CPU cache can perform that operation fast enough that it won't matter. That's one of the reasons Bjarne recommends vector as a good default container, because even if something is algorithmically slower, it might end up being faster in real world applications because of cache locality. Ultimately my point is: I'm not counting on this behavior in my decision, because it's not guaranteed or reliable. In other words, I want to focus on the right algorithm. – void.pointer Jan 19 '17 at 00:28

2 Answers2

5

Erasing a from the front of a vector an element at the time can be quite slow, especially if the buffer is large (unless you can reorder elements, which you cannot with a FIFO queue).

A circular buffer is an excellent, perhaps ideal data structure for a FIFO queue of fixed size. But there is no implementation in the standard library. You'll have to implement it yourself or use a third party implementation such as the Boost one you've discovered.

The standard library provides a high level structure for a growing FIFO queue: std::queue. For a lower level data structure, the double ended queue is a good choice (std::deque, which is the default underlying container of std::queue).

On the flip side, vectors are candidates for CPU cache locality because of the contiguous nature of the data (but this is not guaranteed).

The continuous storage of std::vector is guaranteed. A fixed circular buffer also has continuous storage.

I'm not sure what is guaranteed about cache locality of std::deque but it is typically quite good in practice as the typical implementation is a linked list of arrays.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • I'm not that familiar with boost's circular buffer, but does it allow pushing to the back without overwriting, or using two iterators (the push iterator and the read iterator) to keep track of positioning, and determining if they overlap? – void.pointer Jan 18 '17 at 17:28
  • Might also want a `std::deque` if they want more control. – NathanOliver Jan 18 '17 at 17:30
  • @void.pointer I'm not familiar with Boosts implementation of the circular buffer. You better check the docs. – eerorika Jan 18 '17 at 17:30
  • @NathanOliver yeah, I recommended it first before remembering std::queue. Now I mention both. – eerorika Jan 18 '17 at 17:31
  • I thought it was there. Sometimes I see one when really it is the other so I wasn't sure. +1 from me – NathanOliver Jan 18 '17 at 17:32
  • I had misworded portion of my original question (the part you quoted). I meant to say that the CPU caching is not guaranteed. I have edited my original question to clarify this properly. – void.pointer Jan 18 '17 at 22:10
0

Performance will be poor, which may or may not matter. Taking from the head entails a cascade of moves. But STL has queues for exactly this purpose, just use one.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18