6

I know how to atomically write a value in x86 ASM. But how do I read one? The LOCK prefix can't be used with mov.

To increase a value, I am doing:

lock inc dword ptr Counter

How do I read Counter in a thread-safe way?

starblue
  • 55,348
  • 14
  • 97
  • 151
IamIC
  • 17,747
  • 20
  • 91
  • 154

4 Answers4

5

As I explain to you in this post:

Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

So use:

LOCK        CMPXCHG   EAX, [J]

LOCK CMPXCHG first fence cache memory and than compare the EAX with destination value, if destination value not equ then the result in EAX is destination value.

EDIT: LINKs to:

Intel® 64 and IA-32 Architectures Software Developer’s Manuals

In Volume 3A: System Programming Guide check section 8.1.1

Also check: Optimization Reference Manual section: CHAPTER 7 OPTIMIZING CACHE USAGE

Community
  • 1
  • 1
GJ.
  • 10,810
  • 2
  • 45
  • 62
  • That won't compile since [J] is a memory pointer. It has to be a register value. This is the catch-22 I can't get around. – IamIC Jul 28 '10 at 15:40
  • 1
    I see from your other post that this actually isn't an issue so long as the value is aligned and withing the CPU's bus width. – IamIC Jul 28 '10 at 15:53
  • @IamIC: Not the bus width, exactly. The lowest common denominator across Intel's and AMD's guarantees is that `mov` load/store is atomic [if it doesn't cross an 8-byte boundary (for cached accesses).](https://stackoverflow.com/a/36685056/224132) Or for uncached, if it's aligned or a 16-bit access that doesn't cross a dword boundary. Also `[J]` is simply an absolute or (in x86-64) a RIP-relative addressing mode. It's not double-indirection. It assembles just fine. MASM syntax would often omit the `[]`, but they're optional in MASM and required in NASM. – Peter Cordes Jan 01 '18 at 01:42
  • Anyway, downvoted for incorrectly implying that you need `lock cmpxchg` for a load. **On static data, simply use `ALIGN 4` before the `J:` label.** – Peter Cordes Jan 01 '18 at 01:43
  • Where do I imply such? – IamIC Jan 01 '18 at 02:16
4

I'm not an assembly expert, but word-sized (on x86, 32-bit) reads/writes should be atomic already.

The reason you need to lock the increment is because that's both a read AND a write.

Mike Caron
  • 14,351
  • 4
  • 49
  • 77
  • 4
    Not always! If memory address is in cache which use second CPU in multy CPU unit the reading isn't guaranteed to be atomic. So use "LOCK CMPXCHG EAX, [var]" which first fence memory cache. – GJ. Jul 28 '10 at 06:03
  • 2
    @GJ: I think this only applies to misaligned data - normally you would not have misaligned data so it shouldn't be an issue ? – Paul R Jul 28 '10 at 08:25
  • I know the read wouldn't be atomic, but it would still be a snapshot, which means the value should be correct, surely? Even if you had 2 CPUs and their cache was being synchronized, I don't think LOCK is going to play any part in ensuring the value is the latest before the var is read... or will it? – IamIC Jul 28 '10 at 15:43
  • 1
    @Paul R: Not in the case if two thread running at the some time each under own CPU and accessing the some memory address. In that case the cache sinhronisation is needed. Some instructions like "LOCK CMPXCHG" do this automaticly. Instructions like MOV need first memory fence instruction to sinhronise cached memory. Check: Intel® 64 and IA-32 Architectures Software Developer’s Manuals. I have added links in to my answer. – GJ. Jul 28 '10 at 16:55
  • @GJ.: `mov` load/store is atomic if the address is aligned. x86 has coherent caches, so even if multiple CPUs are reading / writing the same location, [you won't get tearing unless the value is misaligned](https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic-on-x86). – Peter Cordes Jan 01 '18 at 01:37
1

For a simple read, it's mostly about alignment. The easiest way to assure atomic reading is to always use "natural" alignment -- i.e., the alignment is as least as great as the size of the item (e.g., 32-bit item is 32-bit aligned).

Misaligned reads aren't necessarily atomic. For an extreme example, consider reading a 32-bit value at an odd address where the first byte is in one cache line, and the other three bytes are in another cache line. In such a case, an atomic read is essentially impossible.

Since (at least most) processors use a 64-bit wide memory bus, the largest item that can hope to be read atomically is 64 bits.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

It is interesting to read the other replies. I think @GJ is probably on the money.

For many years it was always true that 32-bit read and write was atomic. It is only in recent years with the really aggressive caching that this is no longer guaranteed.

I guess that's why I prefer C++, Java or some such between me and the machine code. These days the machine code is too complex to write reliably (unless you do it a lot to keep your skills sharp). Luckily, today's optimising compilers are so good that you seldom need the performance of hand-optimised assembler.

Michael J
  • 7,631
  • 2
  • 24
  • 30