I'm investigating the execution flow of a OpenMP program linked to libgomp. It uses the #pragma omp parallel for
. I already know that this construct becomes, among other things, a call to GOMP_parallel
function, which is implemented as follows:
void
GOMP_parallel (void (*fn) (void *), void *data,
unsigned num_threads, unsigned int flags)
{
num_threads = gomp_resolve_num_threads (num_threads, 0);
gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
fn (data);
ialias_call (GOMP_parallel_end) ();
}
When executing objdump -d
on libgomp, GOMP_parallel
appears as:
000000000000bc80 <GOMP_parallel@@GOMP_4.0>:
bc80: 41 55 push %r13
bc82: 41 54 push %r12
bc84: 41 89 cd mov %ecx,%r13d
bc87: 55 push %rbp
bc88: 53 push %rbx
bc89: 48 89 f5 mov %rsi,%rbp
bc8c: 48 89 fb mov %rdi,%rbx
bc8f: 31 f6 xor %esi,%esi
bc91: 89 d7 mov %edx,%edi
bc93: 48 83 ec 08 sub $0x8,%rsp
bc97: e8 d4 fd ff ff callq ba70 <GOMP_ordered_end@@GOMP_1.0+0x70>
bc9c: 41 89 c4 mov %eax,%r12d
bc9f: 89 c7 mov %eax,%edi
bca1: e8 ca 37 00 00 callq f470 <omp_in_final@@OMP_3.1+0x2c0>
bca6: 44 89 e9 mov %r13d,%ecx
bca9: 44 89 e2 mov %r12d,%edx
bcac: 48 89 ee mov %rbp,%rsi
bcaf: 48 89 df mov %rbx,%rdi
bcb2: 49 89 c0 mov %rax,%r8
bcb5: e8 16 39 00 00 callq f5d0 <omp_in_final@@OMP_3.1+0x420>
bcba: 48 89 ef mov %rbp,%rdi
bcbd: ff d3 callq *%rbx
bcbf: 48 83 c4 08 add $0x8,%rsp
bcc3: 5b pop %rbx
bcc4: 5d pop %rbp
bcc5: 41 5c pop %r12
bcc7: 41 5d pop %r13
bcc9: e9 32 ff ff ff jmpq bc00 <GOMP_parallel_end@@GOMP_1.0>
bcce: 66 90 xchg %ax,%ax
First, there isn't any call to GOMP_ordered_end
in the source code of GOMP_parallel
, for example. Second, that function consists of:
void
GOMP_ordered_end (void)
{
}
According the the objdump output, this function starts at ba00
and finishes at bbbd
. How could it have so much code in a function that is empty? By the way, there is comment in the source code of libgomp saying that it should appear only when using the ORDERED construct (as the name suggests), which is not the case of my test.
Finally, the main concern here for me is: why does the source code differ so much from the disassembly? Why, for example, isn't there any mention to gomp_team_start
in the assembly?
The system has gcc version 5.4.0