multi-producers/consumers performance

Question

I've written an SharedQueue which is intended to work with several producers/consumers.

class SharedQueue : public boost::noncopyable
{
public:
  SharedQueue(size_t size) : m_size(size){};
  ~SharedQueue(){};

  int count() const {return m_container.size();};
  void enqueue(int item);
  bool enqueue(int item, int millisecondsTimeout);

private:
  const size_t m_size;
  boost::mutex m_mutex;
  boost::condition_variable m_buffEmpty;
  boost::condition_variable m_buffFull;

  std::queue<int> m_container;
};

void SharedQueue::enqueue(int item)
{
  {
    boost::mutex::scoped_lock lock(m_mutex);
    while(!(m_container.size() < m_size)) 
    {
      std::cout << "Queue is full" << std::endl;
      m_buffFull.wait(lock);
    }
    m_container.push(item);
  }
  m_buffEmpty.notify_one();
}

int SharedQueue::dequeue()
{
  int tmp = 0;
  {
    boost::mutex::scoped_lock lock(m_mutex);

    if(m_container.size() == 0) 
    {
      std::cout << "Queue is empty" << std::endl;
      m_buffEmpty.wait(lock);
    }

    tmp = m_container.front();
    m_container.pop();
  }

  m_buffFull.notify_one();
  return tmp;
}


SharedQueue Sq(1000);


void producer()
{
  int i = 0;
  while(true)
  {
    Sq.enqueue(++i);
  }
}

void consumer()
{
  while(true)
  {
    std::cout  << "Poping: " << Sq.dequeue() << std::endl;
  }
}

int main()
{

  boost::thread Producer(producer);
  boost::thread Producer1(producer);
  boost::thread Producer2(producer);
  boost::thread Producer3(producer);
  boost::thread Producer4(producer);

  boost::thread Consumer(consumer);

  Producer.join();
  Producer1.join();
  Producer2.join();
  Producer3.join();
  Producer4.join();

  Consumer.join(); 

  return 0;
}

As you can see I use boost::condition_variable. Is there any way to make the performance better? Perhaps I should consider any other synchronization method?

Chad, I mean time between placing data into the queue and taking data from it. — D_E, Nov 26 '12 at 21:35
Do you have requirements on CPU utilization? You can be faster without waiting, but your application will consume 100% of a CPU core. This may be desired in some situations. — Chad, Nov 26 '12 at 21:41
Then remove the condition variables (keep the locks) and spin wait on each condition. Or move to a lock-less implementation (like Intel's Thread Building Blocks' concurrent queue). — Chad, Nov 26 '12 at 21:56
@Chad, thanks for your piece of advice! But is your solution applicable in the case if it is about 100 producers and consumers? — D_E, Nov 26 '12 at 22:09
Using a lock, only one of the producers/consumers can interact with your queue object at a time. Removing the condition variables ensures that each of them will get into the lock as fast as possible. This can lead to starvation though. Writing performant code is hard :) I'd really suggest looking into the existing tools before reinventing the wheel. — Chad, Nov 26 '12 at 22:12
@D_E, I'm afraid if you have about 100 threads, then the queue design is the least of your performance problems (unless you're running on a really fat server with 32+ total cores, in which case 100 threads are OK). — Soonts, Nov 26 '12 at 22:15

score 1 · Accepted Answer · answered Nov 26 '12 at 21:52

In real-live scenarios not synthetic tests I think your implementation is good enough.

If however you're expecting 10⁶ or more operations per second, and you're developing for Windows, then your solution is not that good.

On Windows, Boost traditionally sucks really bad when you're using multithreading classes. For mutexes, CriticalSection objects are usually much faster. For cond.vars, authors of the boost are reinventing the wheel instead of using the correct Win32 API.
On Windows, I expect the native multi-producers/consumer queue object called "I/O completion port" to be several times more effective than any user-mode implementation possible. It's main goal is I/O, however it's perfectly OK calling PostQueuedCompletionStatus API to post anything you want to the queue. The only drawback - the queue has no upper limit, so you must limit the queue size yourself.

score 1 · Answer 2 · edited May 23 '17 at 11:56

1

This is not a direct answer to your question, but it might be a good alternative.

Depending on how much you want to increase the performance, it may be worthwhile to take a look at the Disruptor Pattern: http://www.2robots.com/2011/08/13/a-c-disruptor/

edited May 23 '17 at 11:56

Community

1
1

answered Nov 26 '12 at 22:32

Kiril

39,672
31
167
226

multi-producers/consumers performance

2 Answers2