9
uint64_t n;      // two 32-bit integers

return ( (uint32_t)(n >> 32) == (uint32_t)n );

What is the fastest way to atomically compare the 32 most-significant bits to the 32 least-significant bits of a uint64_t?

I think one horrible solution is to do: acquire spinlock, read 32 LSB, read 32 MSB, compare to get result, release spinlock, return result. Is there any way to do this without having to take a spinlock?

brooksbp
  • 1,886
  • 4
  • 16
  • 17

5 Answers5

9

How about using a compare-exchange operation on the two different addresses?

Something like: CMPXCHG (int*)&n, (((int*)&n)+1) (note - this actually cannot work).

Edit: changed the syntax to more closely resemble the actual x86 syntax.

Edit 2: as Serge has pointed out, using two memory addresses in an assembly instruction is not supported for most assemblies, so this way can't work directly from memory. This means that this approach can't be used for comparing the two 32 bits portions of a 64 bit variable in an atomic fashion.

Some assemblies (PowerPC at least) are able to provide special instructions (for PowerPC, LWARX and STWCX) that can make this work in a multi-threaded safe way, but it is not exactly what the OP asked, nor will it work for x86.

Eli Iser
  • 2,788
  • 1
  • 19
  • 29
  • 2
    Wouldn't you need to do `+1` instead of `+sizeof(int)`? – Marlon Jun 26 '11 at 06:41
  • Just got back from implementing it with a compare-and-swap.. very similar to what you posted. Don't know why breaking down the addresses hadn't occurred to me at the time. Thanks! – brooksbp Jun 26 '11 at 06:52
  • @Marlon - yes, you are correct. Updated with the change, thanks. – Eli Iser Jun 26 '11 at 06:54
  • 1
    @brooksbp: "Don't know why breaking down the addresses hadn't occurred to me at the time." May be because this solution is wrong? I have no idea what `CMPXCHG (int*)&n, (((int*)&n)+1), res` means. There is x86 instruction CMPXCHG but it has only 2 operands (and implicit operand in EAX - the number to compare with) and this instruction alone cannot implement the operation requested (obviously more than one instruction requires lock). – Serge Dundich Jun 27 '11 at 13:18
  • @brooksbp: "Just got back from implementing it with a compare-and-swap.." Could you please post your code for this. Because at least for x86 (32-bit) platform it is impossible to be atomic. – Serge Dundich Jun 27 '11 at 13:46
  • @Serge - the CMPXCHG syntax I posted was just for example. I'm working mainly with PowerPC and ARM assembly, which have multiple arguments for an instruction, and I forgot to switch my mind to "x86 mode" :). – Eli Iser Jun 27 '11 at 14:02
  • 1
    @Eli Iser: "the CMPXCHG syntax I posted was just for example" Example of what? Of invalid x86 assembler code inspired by the idea that cannot be used to solve the problem in most cases (including x86 case)? – Serge Dundich Jun 27 '11 at 14:54
  • 3
    Note that in most assemblers (including x86) there are no instructions that can use two distinct memory addresses in a single instruction. So the entire idea of "using a compare-exchange operation on the two different addresses" doesn't work because cannot be implemented atomically on most architectures. – Serge Dundich Jun 27 '11 at 15:53
  • 3
    @brooksbp: This answer is completely wrong as Serge says, and should not be the accepted answer. You may also want to rethink your implementation, as it probably doesn't work atomically like you think. – interjay Jun 27 '11 at 16:03
7

This whole operation (atomic comparison of two values in memory) is meaningless unless you can also ensure that writing them is always atomic. It's also subject to an inherent race condition; by the time you've determined that they're equal they may have changed, or vice versa. Whatever problem you're trying to solve almost surely calls for a lock, not atomic operations.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
3
  1. You can do it without lock only if you are able to retrieve 64-bit number atomically on your platform. If it is possible then first - you atomically retrieve 64-bit value in your favorite way (e.g. InterlockedOr64(ptr,0) on 64-bit windows, there is no way if you have 32-bit x86 CPU - unless you have Intel CPU not older than Pentium and you make sure your 64-bit value is 64-bit aligned, not sure about other vendors' x86 CPUs), second - do your compare with the value you retrieved.

  2. You obviously cannot do it in portable way. On platforms where you cannot atomically get 64-bit number it is impossible to do it without lock.

EDIT

Since some severely misleading ideas gain serious popularity in this discussion I feel my duty to write some notes about failing to use Compare&Exchange of 32-bit numbers to solve the problem.

Let's assume we have x86 platform then we might write asm code:

    mov eax, [num+4]
    lock cmpxchg [num], eax
    jz  equal_case_code
    ; non-equal case code follows
equal_case_code:
    ; equal case code follows

Obviously this implementation is not atomic - thread may be interrupted between mov and cmpxchg instructions (as always two memory operands not allowed in one instruction).

32-bit based Compare&Exchange functions from different APIs (like InterlockedCompareExchange from Win32 API) does not provide the correct solution either because their semantic allows atomic access to one 32-bit memory address only.

Serge Dundich
  • 4,221
  • 2
  • 21
  • 16
  • 1
    On 32-bit x86 you should be able to use LOCK FILD to atomically load 64 bits. – augustss Jun 26 '11 at 09:50
  • @augustss: No you can't do that. "The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG." (Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A: Instruction Set Reference, A-M). – Serge Dundich Jun 27 '11 at 13:02
  • 1
    I believe `fild` is already atomic (without a lock prefix) at least as long as the 64-bit value is aligned (and thus doesn't cross cache lines). It may not have been on ancient cpus where the fpu was external, but on anything modern it should be... – R.. GitHub STOP HELPING ICE Jun 27 '11 at 21:37
  • @R..: "I believe fild is already atomic (without a lock prefix) at least as long as the 64-bit value is aligned" True (at least for intel CPUs - I haven't checked the other verdors' docs). But I was assuming the solution without alignment restrictions. I'll update my answer to make it more clear. – Serge Dundich Jun 27 '11 at 23:19
  • Plus, your chmpxchg example does not solve the original problem. You could have at least made a working solution using cmpxchg8b, in which case you would not even need a loop. – zvrba Jun 28 '11 at 06:20
  • @zvrba: Care to explain what was "originally wrong" in my answer? My information is from Intel docs (not from your comments) and very minor change in my answer (highlighted in *italics*) is made in response to comment by @R.. Is this your constructive critics? – Serge Dundich Jun 28 '11 at 06:59
  • @zvrba: "Plus, your chmpxchg example does not solve the original problem." My `chmpxchg` is not supposed to solve the problem. It is supposed to show that `CMPXCHG` cannot solve the problem - it is the very point of that example. – Serge Dundich Jun 28 '11 at 07:04
  • @zvrba: Have you read the discussion about Eli Iser's answer (that was selected answer some time ago)? CMPXCHG note was about that. – Serge Dundich Jun 28 '11 at 07:10
0

What about using a union? So like this:

typedef union {
    uint32_t small[2];
    uint64_t full;
} bigint_t;

Then you go do:

uint64_t n;      // two 32-bit integers

bigint_t mybigint;
mybigint.full = n;
return mybigint.small[0] == mybigint.small[1];

I don't know if this is the fastest, but if you don't copy the uint64_t into the union but use the union directly, it should be quite fast, since it does not need to do anything for the comparison.

Lance Roberts
  • 22,383
  • 32
  • 112
  • 130
Markus Pilman
  • 3,104
  • 3
  • 22
  • 31
  • 4
    The behaviour is undefined if you read from a member of a union other than the one last used for writing. – dreamlax Jun 26 '11 at 06:42
  • @dreamlax - are you sure? I'll be happy to see a reference for this, since I think this does work (maybe in specific compilers, but not in the strict C standard?). – Eli Iser Jun 26 '11 at 07:07
  • @dreamlax At least for gcc it works for sure and as far as I remember the unix socket libraries use this mechanism to represent IP addresses. Can you provide a reference? – Markus Pilman Jun 26 '11 at 07:10
  • @Eli @Markus: http://stackoverflow.com/questions/1812348/a-question-about-union-in-c – dreamlax Jun 26 '11 at 07:44
  • 1
    In particular, look at [Andrey's answer](http://stackoverflow.com/questions/1812348/a-question-about-union-in-c/1812932#1812932). – dreamlax Jun 26 '11 at 07:46
  • @dreamlax - great, thanks for the link. I ran this code through VS2010, and it gave the expected result, little-endian and all. – Eli Iser Jun 26 '11 at 09:18
0

Use inline assembly to load the 64-bit integer into a MMX or SSE register (64-bit reads are atomic) and then compare the halves.

zvrba
  • 24,186
  • 3
  • 55
  • 65
  • "(64-bit reads are atomic)" What makes you think so? It is not guaranteed if you have SMP system (unless you use `LOCK` instruction prefix that is not allowed for FPU/MMX/SSE instructions). – Serge Dundich Jun 27 '11 at 15:03
  • Yes, they are. Intel's manual volume 3A, section 8.1.1: Guaranteed Atomic Operations: "The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically: Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line." – zvrba Jun 27 '11 at 15:24
  • @zvrba: The next line in intel's manual after you've quoted: "Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors." What is supposed to make sure that the value is within page boundaries and within cache line? – Serge Dundich Jun 27 '11 at 15:47
  • @zvrba: "Proper alignment". Right. So it means that you depend on the alignment (that is implementation dependent, i.e. compiler specific) or you have to assure proper alignment yourself. Also it is all true for P6 or newer processors (so not true for older processors). And probably not true for some non-intel x86 CPU. IMO too much conditions. – Serge Dundich Jun 27 '11 at 18:07
  • A bunch of "maybes" that are easily verified in practice, if necessary. I haven't seen you come with any reasonable answer to the question. And DO find a solution that works w/o any special setup (alignment, etc.) and that is not CPU-specific. – zvrba Jun 27 '11 at 20:58
  • 1
    @zvrba: "And DO find a solution that works w/o any special setup (alignment, etc.) and that is not CPU-specific." Well, obviously universal solution (that works for any CPU) is using lock. And obviously solution without lock that is not CPU specific does not exist (and thus I cannot provide such solution - nobody can). – Serge Dundich Jun 27 '11 at 22:07
  • @zvrba: "I haven't seen you come with any reasonable answer to the question." Are we discussing my answer? I would gratefully read your constructive critics of my answer (as a comment to my answer). – Serge Dundich Jun 27 '11 at 22:11