Adding a synchronization point in x86 assembly

Question

I have to have a way to have four threads run from a certain point at roughly the same time. For example

thread 1 : mov eax,ebx, mov ecx, edx, [S], mov eax, edx, ...
thread 2:  sbb eax,ebx, [S], mov ecx, edx, ...
thread 3:  mov eax,ebx, xchg eax,ebx, cmp edx, ecx, [S], mov eax, ebx, ...
thread 4:  dec eax, sub eax,ecx, [S], ....

[S] is a place holder for a 'synchronization point'. After all threads have reached this point, they should start roughly at the same time. How do I do this?

The code I have is something like

number_of_threads 4

temp:
      dd 0            ;a 'synchronization variable'

THREAD 1 code

;synchronization [S]

lock add [temp],0x1
wloop1:                 
cmp [temp], number_of_threads 
jne wloop1

THREAD 2 code

;synchronization [S]

lock add [temp],0x1
wloop2:                 
cmp [temp], number_of_threads 
jne wloop2

THREAD 3 code

;synchronization [S]

lock add [temp],0x1
wloop3:                 
cmp [temp], number_of_threads 
jne wloop3

THREAD 4 code

;synchronization [S]

lock add [temp],0x1
wloop4:                 
cmp [temp], number_of_threads 
jne wloop4

This way we make sure that all threads reach [S] and start off from there at roughly the same time. The code that follows [S] executes only if temp becomes number_of_threads Is there a problem with this code such as race? I am not even sure if this is the way to do this.

score 1 · Accepted Answer · edited May 23 '17 at 12:20

1

That's one way to do it, and I don't see a race condition. It sure ties up your threads, though, with busy waiting. Not bad if the wait is expected to be very brief, but for waits longer than a millisecond or so, you really should use an OS-supplied synchronization primitive. Spinning that loop while waiting eats CPU cycles like candy, and you're going to notice a performance problem if those waits are very long.

On Windows, you'd use a Synchronization Barrier. There's probably something analogous in the Linux world. I can't say for sure, since I'm not that familiar with Linux programming.

You might be interested in the x86 Pause instruction, which could reduce the CPU load. This answer has a good description.

edited May 23 '17 at 12:20

Community

1
1

answered Aug 08 '13 at 17:09

Jim Mischel

131,090
20
188
351

1

+1. I would offer the opinion, though, that a second is way too long to sit in a spin lock (on average). A millisecond or two should be the norm - otherwise a higher-level sync primitive should be used. – 500 - Internal Server Error Aug 08 '13 at 17:12
@Jim Mischel Synchronization Barrier is definitely a starting point, the problem is I don't have an OS. I am trying to implement this in assembly alone. – Mathai Aug 08 '13 at 17:19
@JimMischel thanks, so looks like the easiest it to have a PAUSE instruction before CMP. – Mathai Aug 08 '13 at 19:03

Adding a synchronization point in x86 assembly

1 Answers1