With current C++ compilers you can have atomic support of atomics that are larger than the actual support of your CPU. With x64 you can have atomics that are 16 bytes, but std::atomic also works with larger tuples. Look at this code:
#include <iostream>
#include <atomic>
using namespace std;
struct S { size_t a, b, c; };
atomic<S> apss;
int main()
{
auto ref = apss.load( memory_order_relaxed );
apss.compare_exchange_weak( ref, { 123, 456, 789 } );
cout << sizeof ::apss << endl;
}
The cout above always prints 32 for my platform. But how do these transactions actually work without a mutex ? I don't get any clue from inspecting the disassembly.
If I run the following code with MSVC++:
#include <atomic>
#include <thread>
#include <array>
using namespace std;
struct S { size_t a, b, c, d, e; };
atomic<S> apss;
int main()
{
array<jthread, 2> threads;
auto threadFn = []()
{
auto ref = apss.load( memory_order_relaxed );
for( size_t i = 10'000'000; i--; apss.compare_exchange_weak( ref, { } ) );
};
threads[0] = jthread( threadFn );
threads[1] = jthread( threadFn );
}
There's almost no kernel-time consumed by the code. So the contention actually happens completely in user-space. I guess that's some kind of software transactional memory happening here.