If I want to implement 128-bit atomic type on x64, can I get with _mm_store_si128
and _mm_load_si128
to avoid cmpxchg16b
for relaxed load
and store
?
(If needed, can assume that only load
and store
are needed, although it would be good if I can mix those with cmpxchg16b
)