How would you achieve 128-bit atomic operations in x86?
Intel's System Programming Guide, Part 1, 8.1 Locked Atomic Operations specifies guaranteed 16-, 32-, and 64-bit atomic operations. So, can you achieve 128-bit atomic operations by doing 2 64-bit ops with the LOCK prefix? Something like...
LOCK mov 64bits->addr
LOCK mov 64bits->addr+64bits
Aparently SSE has 128-bit XMM registers. Can you just do 128-bit compare-and-swap using these registers?