How do I atomically read a value in x86 ASM?

Question

I know how to atomically write a value in x86 ASM. But how do I read one? The LOCK prefix can't be used with mov.

To increase a value, I am doing:

lock inc dword ptr Counter

How do I read Counter in a thread-safe way?

score 5 · Answer 1 · edited May 23 '17 at 12:33

5

As I explain to you in this post:

Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

So use:

LOCK        CMPXCHG   EAX, [J]

LOCK CMPXCHG first fence cache memory and than compare the EAX with destination value, if destination value not equ then the result in EAX is destination value.

EDIT: LINKs to:

Intel® 64 and IA-32 Architectures Software Developer’s Manuals

In Volume 3A: System Programming Guide check section 8.1.1

Also check: Optimization Reference Manual section: CHAPTER 7 OPTIMIZING CACHE USAGE

edited May 23 '17 at 12:33

Community

1
1

answered Jul 28 '10 at 05:52

GJ.

10,810
2
45
62

That won't compile since [J] is a memory pointer. It has to be a register value. This is the catch-22 I can't get around. – IamIC Jul 28 '10 at 15:40
1

I see from your other post that this actually isn't an issue so long as the value is aligned and withing the CPU's bus width. – IamIC Jul 28 '10 at 15:53
@IamIC: Not the bus width, exactly. The lowest common denominator across Intel's and AMD's guarantees is that `mov` load/store is atomic [if it doesn't cross an 8-byte boundary (for cached accesses).](https://stackoverflow.com/a/36685056/224132) Or for uncached, if it's aligned or a 16-bit access that doesn't cross a dword boundary. Also `[J]` is simply an absolute or (in x86-64) a RIP-relative addressing mode. It's not double-indirection. It assembles just fine. MASM syntax would often omit the `[]`, but they're optional in MASM and required in NASM. – Peter Cordes Jan 01 '18 at 01:42
Anyway, downvoted for incorrectly implying that you need `lock cmpxchg` for a load. **On static data, simply use `ALIGN 4` before the `J:` label.** – Peter Cordes Jan 01 '18 at 01:43
Where do I imply such? – IamIC Jan 01 '18 at 02:16

score 4 · Accepted Answer · answered Jul 28 '10 at 04:11

4

I'm not an assembly expert, but word-sized (on x86, 32-bit) reads/writes should be atomic already.

The reason you need to lock the increment is because that's both a read AND a write.

answered Jul 28 '10 at 04:11

Mike Caron

14,351
4
49
77

4

Not always! If memory address is in cache which use second CPU in multy CPU unit the reading isn't guaranteed to be atomic. So use "LOCK CMPXCHG EAX, [var]" which first fence memory cache. – GJ. Jul 28 '10 at 06:03
2

@GJ: I think this only applies to misaligned data - normally you would not have misaligned data so it shouldn't be an issue ? – Paul R Jul 28 '10 at 08:25
I know the read wouldn't be atomic, but it would still be a snapshot, which means the value should be correct, surely? Even if you had 2 CPUs and their cache was being synchronized, I don't think LOCK is going to play any part in ensuring the value is the latest before the var is read... or will it? – IamIC Jul 28 '10 at 15:43
1

@Paul R: Not in the case if two thread running at the some time each under own CPU and accessing the some memory address. In that case the cache sinhronisation is needed. Some instructions like "LOCK CMPXCHG" do this automaticly. Instructions like MOV need first memory fence instruction to sinhronise cached memory. Check: Intel® 64 and IA-32 Architectures Software Developer’s Manuals. I have added links in to my answer. – GJ. Jul 28 '10 at 16:55
@GJ.: `mov` load/store is atomic if the address is aligned. x86 has coherent caches, so even if multiple CPUs are reading / writing the same location, [you won't get tearing unless the value is misaligned](https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic-on-x86). – Peter Cordes Jan 01 '18 at 01:37

Jerry Coffin · Answer 3 · 2010-07-28T04:45:06.983

For a simple read, it's mostly about alignment. The easiest way to assure atomic reading is to always use "natural" alignment -- i.e., the alignment is as least as great as the size of the item (e.g., 32-bit item is 32-bit aligned).

Misaligned reads aren't necessarily atomic. For an extreme example, consider reading a 32-bit value at an odd address where the first byte is in one cache line, and the other three bytes are in another cache line. In such a case, an atomic read is essentially impossible.

Since (at least most) processors use a 64-bit wide memory bus, the largest item that can hope to be read atomically is 64 bits.

score 0 · Answer 4 · answered Jul 28 '10 at 17:11

0

It is interesting to read the other replies. I think @GJ is probably on the money.

For many years it was always true that 32-bit read and write was atomic. It is only in recent years with the really aggressive caching that this is no longer guaranteed.

I guess that's why I prefer C++, Java or some such between me and the machine code. These days the machine code is too complex to write reliably (unless you do it a lot to keep your skills sharp). Luckily, today's optimising compilers are so good that you seldom need the performance of hand-optimised assembler.

answered Jul 28 '10 at 17:11

Michael J

7,631
2
24
30

C++ will not guarantee anything about memory semantics above what the CPU do, nor will Java without volatile. – Axel Gneiting Aug 27 '12 at 07:46
C++ can guarantee atomic accesses for atomic types (C++11 stuff), as do C (C11). – Marwan Burelle Jun 03 '14 at 13:51

How do I atomically read a value in x86 ASM?

4 Answers4

Linked