I can not understand how Posix allows any thread to unlock (post) on a semaphore. Let's consider following example:
// sem1 and sem2 are a Posix semaphore,
// properly initialized for single process use
// at this point, sem2 is locked, sem1 is unlocked
// x and y are global (non-atomic, non-volatile) integer variables
// thread 1 - is executing right now
rc = sem_wait(&sem1); // succeeded, semaphore is 0 now
x = 42;
y = 142;
sem_post(&sem2);
while (true);
// thread 2. waits for sem2 to be unlocked by thread1
sem_wait(&sem2);
sem_post(&sem1);
// thread 3
sem_wait(&sem1); // wakes up when sem1 is unlocked by thread2
#ifdef __cplusplus
std::cout << "x: " << x << ; y: " << y << "\n";
#else
printf("x: %d; y: %d\n", x, y);
#endif
Now, according to everything I've read, this code is 100% kosher for passover. In thread 3, we are guaranteed to see x
as 42, y
as 142. We are proteced from any race.
But this is what I can't understand. All those threads can potentially be executed on 3 different cores. And if the chip doesn't have internally strong memory ordering (ARM, PowerPC) or writes are not-atomic (x86 for unaligned data) how can thread2 on Core2 possibly request Core1 (busy with thread1) to properly release the data / complete writes / etc? As far as I know, there are no such commands!
What I am missing here?
EDIT. Please note, suggested duplicate doesn't answer my question. It reiterates my statement, but doesn't explain how the effect can possibly be achieved. In particular, it doesn't explain how Core2 can put memory barrier on data inside Core1's cache.