I am considering options for implementation of a queue for a project, a requirement of which is that the producer, at least, must be as low latency as possibe. To this end, I have been investigating "lock free" queues using std::atomic
to control access to the data structure by the producer and consumer threads. It was my hope that this would avoid overheads in std::mutex
, and specifically std::unique_lock
, which the code currently uses.
To this end, I have written a simple test program to assess the relative performance of std::mutex
(coupled with std::unique_lock
) and std::atomic
. The program also does a check to ensure the atomic object is lock free, which it is.
#include <mutex>
#include <atomic>
#include <thread>
#include <chrono>
#include <iostream>
#define CYCLES 100000000
void testAtomic()
{
bool var(true);
std::atomic_bool _value(true);
std::cout << "atomic bool is ";
if(!_value.is_lock_free())
std::cout << "not ";
std::cout << "lock free" << std::endl;
const auto _start_time = std::chrono::high_resolution_clock::now();
for(size_t counter = 0; counter < CYCLES; counter++)
{
var = _value.load();
var = !var;
_value.store(var);
}
const auto _end_time = std::chrono::high_resolution_clock::now();
std::cout << 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>
(_end_time - _start_time).count() << " s" << std::endl;
}
void testMutex()
{
bool var(true);
std::mutex _mutex;
std::chrono::high_resolution_clock _clock;
const auto _start_time = std::chrono::high_resolution_clock::now();
for(size_t counter = 0; counter < CYCLES; counter++)
{
std::unique_lock<std::mutex> lock(_mutex);
var = !var;
}
const auto _end_time = std::chrono::high_resolution_clock::now();
std::cout << 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>
(_end_time - _start_time).count() << " s" << std::endl;
}
int main()
{
std::thread t1(testAtomic);
t1.join();
std::thread t2(testMutex);
t2.join();
return 0;
}
When running this program, I get the following output:
atomic bool is lock free
3.49434 s
2.31755 s
This would indicate to me that std::mutex
(and std::unique_lock
) is significantly faster, which is the opposite of what I've come to expect from reading up about atomics vs mutexes. Are my findings correct? Is there a problem with my test program? Is my understanding about the differences between the two incorrect?
The code was compiled with GCC 4.8.5 on CentOS7