I am experimenting with lock-free algorithms. In some cases, I only need an atomic store() operation and no compare_exchange. I have done simple measurements to get a feel for the performance advantage of store over compare_exchange. The surprise was: compare_exchange is faster!
Why is that, or am I making a measurement error?
This is the measurement, all operations are relaxed:
using ValueType = unsigned long long;
constexpr auto MemoryOrder = memory_order_relaxed;
atomic<ValueType> value{};
cout << "value.is_lock_free(): " << value.is_lock_free() << endl;
ValueType oldValue{};
constexpr auto MAX = 100000000;
//===========================================================
cout << "value.store: " << MAX << endl;
auto start = chrono::high_resolution_clock::now();
for(ValueType newValue{1ULL}; newValue<MAX; ++newValue){
value.store(newValue, MemoryOrder);
oldValue = newValue;
}
auto end = chrono::high_resolution_clock::now();
chrono::duration<double> diffStore = end - start;
cout << "diffStore: " << diffStore.count() << " s" << endl;
//===========================================================
cout << "value.compare_exchange_weak: " << MAX << endl;
value.store(0, MemoryOrder);
oldValue = 0ULL;
start = chrono::high_resolution_clock::now();
for(ValueType newValue{1ULL}; newValue<MAX; ++newValue){
value.compare_exchange_weak(oldValue, newValue, MemoryOrder, MemoryOrder);
oldValue = newValue;
}
end = chrono::high_resolution_clock::now();
chrono::duration<double> diffCompare = end - start;
cout << std::setw(9);
cout << "diffCompare: " << diffCompare.count() << " s" << endl;
cout << "Store/Compare: " << diffStore / diffCompare << endl;
}
Output:
value.is_lock_free(): 1
value.store: 100.000.000
diffStore: 5.77747 s //Duration
value.compare_exchange_weak: 100.000.000
diffCompare: 4.44158 s //Duration
Store/Compare: 1.30077