Assembly instruction to replace openmp critical region

Question

I have an array of elements which are processed by openmp tasks. It is possible that a task might add new elements at the end of the array. Of course, these elements must be processed as well and could spawn new items. Currently I'm using this code

int p;
#pragma omp critical
{
    p=l.n++;
}

This just reserves a place at the end of the array. The type of l is

struct list
{
    int n;
    double *e;
}

and p will be used as an index to where to store the new element. I was wondering if there is a way to perform this operation without using a critical region. Is there an assembly instruction which copies a value and then increments the original atomically?

The code will be executed on a nehalem cpu, no need to worry about older machines

score 6 · Accepted Answer · edited May 23 '17 at 12:07

6

#pragma omp atomic capture
p = l.n++;

This should use atomic an increment while capturing the value, if the hardware supports it.

Read more about #pragma omp atomic in this question: openMP, atomic vs critical?

And here is Intel's documentation for #pragma omp atomic.

I tried compiling a minimal example with gcc -fopenmp -m32 -O2 -S:

int i, j;
void foo (void)
{
  #pragma omp atomic capture
  i = j++;
}

What I got was a simple atomic 'fetch and add', which is what we want:

movl $1, %eax       # eax = 1
lock xaddl %eax, j  # atomic {swap (eax,j); j = eax + j;}
movl %eax, i        # i = eax
ret

edited May 23 '17 at 12:07

Community

1
1

answered Sep 11 '12 at 09:48

ArjunShankar

23,020
5
61
83

Doesn't work. Compiler (icc) says error: Illegal operation in atomic expression. That's why I was looking for an assembly instruction – Patrik Sep 11 '12 at 09:55
@Patrik - Apparently, by default `#pragma omp atomic` can only do atomic updates. What you need is `#pragma omp atomic capture`. I have edited the answer to reflect this. – ArjunShankar Sep 11 '12 at 10:03
@Patrik - I am curious to know if the `capture` clause fixed the error in ICC. At least the copy of GCC that I have right now is old enough that it does not implement the clause, so I cannot verify it myself. – ArjunShankar Sep 11 '12 at 15:02
It does. Maybe it's what I'm looking for. I want to avoid locking – Patrik Sep 12 '12 at 07:35
@Patrik - I just checked with GCC. Seems to do the right thing. It generates an 'atomic fetch and add' which is what we want. [See this](http://gist.github.com/3742251). – ArjunShankar Sep 18 '12 at 09:36

score 1 · Answer 2 · answered Sep 11 '12 at 10:14

Yes, there are a few possible choices on x86.

XADD r/m, r

This instruction atomically adds the second operand (r) to the first operand (r/m) and loads the second operand (r) with the original value of the first operand (r/m).

To use it you will need to load the second operand with the amount of the increment (I'm guessing, 1, here) and the first operand should be the memory location of what's being incremented.

This instruction must be preceded with the LOCK prefix (it will make it atomic).

The InterlockedAdd() function in Microsoft Visual C++ does this and, AFAIR, uses XADD if it's available (available since i80486).

Another way is to use a loop with the CMPXCHG instruction...

Pseudocode:

while (true)
{
  int oldValue = l.n;
  int newValue = oldValue + 1;
  if (CAS(&l.n, newValue, oldValue) == oldValue)
    break;
}

The CAS(), which stands for Compare And Swap (a common term in concurrent programming), is a function that tries to atomically replace a value in memory with a new value. The replacement succeeds when the value being replaced is equal to the last supplied parameter, oldValue. It fails otherwise. CAS returns the original value from the memory, which lets us know whether the replacement has been successful (we compare the returned value with oldValue). The failure (the returned original value differs from oldValue) indicates that between reading oldValue and the moment we tried to replace it with newValue another thread changed the value in the memory. In this case we simply retry the whole procedure.

The CMPXCHG instruction is the x86 CAS.

In Microsoft Visual C++ InterlockedCompareExchange() uses CMPXCHG to implement the CAS.

If XADD isn't available, InterlockedAdd() is implemented using the CAS/CMPXCHG/InterlockedCompareExchange().

On some other CPUs there can be other possibilities. Some allow atomic execution of a few adjacent instructions.

score 0 · Answer 3 · answered Sep 11 '12 at 10:05

0

That's really just an atomic increment that returns the result, which looks like this:

mov p, 1  ; p must be a register
lock xadd [l.n], p

And now you know. I see no reason to actually use that though, there are ways to do it without resorting to assembly code.

answered Sep 11 '12 at 10:05

harold

61,398
6
86
164

I thought assembly was the most natural choice since every instruction is atomic – Patrik Sep 11 '12 at 11:43
2

@Patrik well yes somewhere down the line an instruction must be used - but that's true of everything. As usual though, it can be abstracted. Also, not all instructions are atomic. – harold Sep 11 '12 at 12:07

Assembly instruction to replace openmp critical region

3 Answers3