4

If I have an integer x, which of the following statements are atomic on the ARM architecture on an iPhone?

int x;
int y;
x = 92;  // . . . . . .   A
x++;     // . . . . . .   B
y = ++x; // . . . . . .   C
printf("x = %d\n", x); // D

I know, that on the i386 platform, statements A, B and D are atomic, while C is not. I'm pretty sure that C is not atomic in iOS. I suspect that basic load and store operations (D and A) are atomic in iOS too, but I'm not sure. Does anyone know more?

How is it with 16 bit and 8 bit values? Or with 64 bit values on the iPhone 5S and with 64 bit values on the iPhone 5 and below?

(If the answer is as I suspect... Is there any platform where basic load and store operations are not atomic?)

yurish
  • 1,515
  • 1
  • 15
  • 16
Michael
  • 6,451
  • 5
  • 31
  • 53
  • I'm not so sure that the B is atomic on i386 :) What is it you are trying to do? Can you use OS synchronization primitives for this? For example OSAtomic... family of functions https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/ThreadSafety.html#//apple_ref/doc/uid/10000057i-CH8-SW14 – yurish Dec 16 '13 at 17:43
  • they are not atomic in the sense that they create a full memory barrier. but otherwise they are, i think. e.g. if x is 23 and one thread changes it by calling x++ and another thread accesses x at the same time, that other thread will either get 23 or 24 when it loads x, but not -817376 or 21. – Michael Dec 16 '13 at 18:00
  • I may be wrong but think that atomicity is language thing not platform. When you say that the above operations are atomic you rely on the compiler implementation details. – yurish Dec 16 '13 at 18:20
  • well, my question was about the LLVM compiler on the ARM platform. This was kinda implied by the question, because putting any other kind of code onto an iOS-device is very non-standard. Earlier versions of XCode may use the GCC compiler (i think), but I don't think that this changes anything. – Michael Dec 16 '13 at 18:33
  • e.g. can I rewrite the code here: https://github.com/robbiehanson/CocoaAsyncSocket/blob/master/GCD/GCDAsyncSocket.m#L3011 to just `return !(flags & kSocketStarted);`? If no, how wrong will it be? If I call `__sync_synchronize();` before the return, will it be correct? – Michael Dec 16 '13 at 18:38
  • Can you give an example on what is not atomic? How would a value look like in that case? – auselen Dec 16 '13 at 18:56
  • @Michael If the value of x is 23 and two thread use `x++`, you may get either 25 (frequently) or 24 (infrequently) as a result. This is common for a *use count* of garbage collected items. This is not a *language* thing except for [`sig_atomic_t`](http://stackoverflow.com/questions/8488791/proper-usage-of-volatile-sig-atomic-t) which allows a write in a single atomic cycle; this question is very CPU specific and makes the code non-portable. – artless noise Dec 16 '13 at 18:57
  • Yeah, I guess you're right ;) – Michael Dec 16 '13 at 19:00

1 Answers1

2

You don't give enough context on the declaration of x and y. If there are locals within a function, then they will be assigned to registers and other threads can not touch them. So I assume you mean they are global (or at least static).

The ARM is a load-store architecture. It does not have memory to memory instructions. So really only lines A/D are atomic. You have unconditionally wrote the value. It is not ordered versus another thread. If one thread writes 92 and another writes 29, there will be no way to know what is written without a mutex of some sort.

Early ARM cpus have swp; but most iOS products will use ldrex and strex. You must use these instructions to have any sort of atomic updates.

The ARM can write 8/16/32/64 bits at a time and most system designs will have the caches synchronized so that a write by one CPU is seen by another. A ring buffer structure could be used with a producer/consumer where only one CPU writes to the ring head and the other to the ring tail; Ie, this would be an atomic structure you can use without swp or ldrex and strex.

It is possible that if you manually allocate a 64bit value, you could mess things up. You would have to try hard to do this. For instance, if a page fault occurs between the upper/lower 32 bits of the 64 bit value. Obviously, un-aligned values may also have issue depending on the CPU type. If you declare things normally, they should be well aligned by the compiler. This is also possible with unaligned 32 bit and 16 bit values. iOS may make these accesses look atomic to user space. Generally, if you rely on atomic behavior, you should not do any strange casts, and declare the variables normally.

Edit:

(If the answer is as I suspect... Is there any platform where basic load and store operations are not atomic?)

Yes, there are many platforms where larger values load/stores are not atomic; especially for all data types. Pragmatically, most CPUs are at least 8-bits (although even 4bit CPUs exist), so a char load and store are usually atomic. For smaller CPUs, the load/store of larger values may take several cycles. In these cases, the compiler will take multiple cycles to update a value. This is the idea behind sig_atomic_t; it is a variable size that can be updated atomically. sig_atomic_t should be available on all Posix systems, but there is no general guarantee in plain C99. C11 adds the _Atomic type qualifier.

Community
  • 1
  • 1
artless noise
  • 21,212
  • 6
  • 68
  • 105
  • Thanks! Yes I meant global or static variables. What do you mean by "items A/D are atomic"? What does A/D stand for? Did I understand it right, that if x is 92 and one thread changes it to 29, than another thread will see either 92 or 29 but not a completely unrelated value? It would be interesting to know how old the cache can become? Is everything always synchronized within milliseconds, or within seconds, or can synchronization take even longer, occasionally? – Michael Dec 16 '13 at 19:12
  • Yes. That is correct. Maybe the values `0xffee0000UL` and `0xddccUL` would be better. There is no way to see `0xffeeddccUL` or just `0x0UL` or some other combination. The `str` and `ldr` are a single instruction and operate without interruption. In a multi-cpu design, the caches/memory is reserved and broadcast so different CPUs don't see different values. *A/D* are your items/lines *A* and *D*. – artless noise Dec 16 '13 at 19:17